iopipe v0.0.4 - RingBuffers!

bioinfornatics bioinfornatics at fedoraproject.org
Mon May 14 10:02:07 UTC 2018


On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer 
wrote:
> OK, so at dconf I spoke with a few very smart guys about how I 
> can use mmap to make a zero-copy buffer. And I implemented this 
> on the plane ride home.
>
> However, I am struggling to find a use case for this that 
> showcases why you would want to use it. While it does work, and 
> works beautifully, it doesn't show any measurable difference 
> vs. the array allocated buffer that copies data when it needs 
> to extend.
>
> If anyone has any good use cases for it, I'm open to 
> suggestions. Something that is going to potentially increase 
> performance is an application that needs to keep the buffer 
> mostly full when extending (i.e. something like 75% full or 
> more).
>
> The buffer is selected by using `rbufd` instead of just `bufd`. 
> Everything should be a drop-in replacement except for that.
>
> Note: I have ONLY tested on Macos, so if you find bugs in other 
> OSes let me know. This is still a Posix-only library for now, 
> but more on that later...
>
> As a test for Ring buffers, I implemented a simple "grep-like" 
> search program that doesn't use regex, but phobos' canFind to 
> look for lines that match. It also prints some lines of 
> context, configurable on the command line. The lines of context 
> I thought would show better performance with the RingBuffer 
> than the standard buffer since it has to keep a bunch of lines 
> in the buffer. But alas, it's roughly the same, even with large 
> number of lines for context (like 200).
>
> However, this example *does* show the power of iopipe -- it 
> handles all flavors of unicode with one template function, is 
> quite straightforward (though I want to abstract the line 
> tracking code, that stuff is really tricky to get right). Oh, 
> and it's roughly 10x faster than grep, and a bunch faster than 
> fgrep, at least on my machine ;) I'm tempted to add regex 
> processing to see if it still beats grep.
>
> Next up (when my bug fix for dmd is merged, see 
> https://issues.dlang.org/show_bug.cgi?id=17968) I will be 
> migrating iopipe to depend on 
> https://github.com/MartinNowak/io, which should unlock Windows 
> support (and I will add RingBuffer Windows support at that 
> point).
>
> Enjoy!
>
> https://github.com/schveiguy/iopipe
> https://code.dlang.org/packages/iopipe
> http://schveiguy.github.io/iopipe/
>
> -Steve

Hi Steve,

It is an exciting works, that could help in bioinformatics area.
Indeed in bioinformatics we are I/O bounding and we process lot 
of big files the amount of data can be in gigabytes, terabytes 
and even some times in petabytes.

So processing efficiently these amount of data is critic. Some 
years ago I got a request 'How to parse fastq file format in D?' 
and monarch_dodra wrote a really fast parser (code: 
http://dpaste.dzfl.pl/37b893ed )

It could be interesting to show how iopipe is fast.

You can grab a fastq file from 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/ and take a look at iopipe perf .

fastq file is plain test format and it is usually a repetition of 
four lines:
1/ title and description
this line starts with @
2/ sequence line
this line contains ususally DNA letters (ACGT)
3/ comment line
this line starts with +
4/ quality of amino acids
this line has the same length as the sequence line (n°2)

Rarely, the comment section is over multiple lines.
Warning the @ and + characters can be found inside the quality 
line, thus I search a pattern of two characters '\n@' and '\n+'. 
I never split file by line as it is a waste of time instead I 
read the content as a stream.

I hope this show case help you

Good luck :-)



More information about the Digitalmars-d-announce mailing list