iopipe v0.0.4 - RingBuffers!

Steven Schveighoffer schveiguy at yahoo.com
Mon May 14 14:23:43 UTC 2018


On 5/14/18 6:02 AM, bioinfornatics wrote:
> On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
>> OK, so at dconf I spoke with a few very smart guys about how I can use 
>> mmap to make a zero-copy buffer. And I implemented this on the plane 
>> ride home.
>>
>> However, I am struggling to find a use case for this that showcases 
>> why you would want to use it. While it does work, and works 
>> beautifully, it doesn't show any measurable difference vs. the array 
>> allocated buffer that copies data when it needs to extend.
>>
>> If anyone has any good use cases for it, I'm open to suggestions. 
>> Something that is going to potentially increase performance is an 
>> application that needs to keep the buffer mostly full when extending 
>> (i.e. something like 75% full or more).
>>
>> The buffer is selected by using `rbufd` instead of just `bufd`. 
>> Everything should be a drop-in replacement except for that.
>>
>> Note: I have ONLY tested on Macos, so if you find bugs in other OSes 
>> let me know. This is still a Posix-only library for now, but more on 
>> that later...
>>
>> As a test for Ring buffers, I implemented a simple "grep-like" search 
>> program that doesn't use regex, but phobos' canFind to look for lines 
>> that match. It also prints some lines of context, configurable on the 
>> command line. The lines of context I thought would show better 
>> performance with the RingBuffer than the standard buffer since it has 
>> to keep a bunch of lines in the buffer. But alas, it's roughly the 
>> same, even with large number of lines for context (like 200).
>>
>> However, this example *does* show the power of iopipe -- it handles 
>> all flavors of unicode with one template function, is quite 
>> straightforward (though I want to abstract the line tracking code, 
>> that stuff is really tricky to get right). Oh, and it's roughly 10x 
>> faster than grep, and a bunch faster than fgrep, at least on my 
>> machine ;) I'm tempted to add regex processing to see if it still 
>> beats grep.
>>
>> Next up (when my bug fix for dmd is merged, see 
>> https://issues.dlang.org/show_bug.cgi?id=17968) I will be migrating 
>> iopipe to depend on https://github.com/MartinNowak/io, which should 
>> unlock Windows support (and I will add RingBuffer Windows support at 
>> that point).
>>
>> Enjoy!
>>
>> https://github.com/schveiguy/iopipe
>> https://code.dlang.org/packages/iopipe
>> http://schveiguy.github.io/iopipe/
>>
> 
> Hi Steve,
> 
> It is an exciting works, that could help in bioinformatics area.
> Indeed in bioinformatics we are I/O bounding and we process lot of big 
> files the amount of data can be in gigabytes, terabytes and even some 
> times in petabytes.
> 
> So processing efficiently these amount of data is critic. Some years ago 
> I got a request 'How to parse fastq file format in D?' and monarch_dodra 
> wrote a really fast parser (code: http://dpaste.dzfl.pl/37b893ed )
> 
> It could be interesting to show how iopipe is fast.

Yeah, I have been working on and off with Vang Le (biocyberman) on using 
iopipe to parse such formats. He gave a good presentation at dconf this 
year on using D in bioinformatics, and I think it is a great fit for D!

At dconf, I threw together a crude fasta parser (with the intention of 
having it be the base for parsing fastq as well) to demonstrate how 
iopipe can perform while parsing such things. I have no idea how fast or 
slow it is, as I just barely got it to work (pass unit tests I made up 
based on wikipedia entry for fasta), but IMO, the direct buffer access 
makes fast parsing much more pleasant than having to deal with your own 
buffering (using phobos makes parsing a bit difficult, however, I still 
see a need for some parsing tools for iopipe).

You can find that library here: https://github.com/schveiguy/fastaq

Not being in the field of bioinformatics, I can't really say that I am 
likely to continue development of it, but I'm certainly willing to help 
with iopipe for anyone who wants to use it in this field.

-Steve


More information about the Digitalmars-d-announce mailing list