Looking for a Code Review of a Bioinformatics POC
Jon Degenhardt
jond at noreply.com
Fri Jun 12 03:32:48 UTC 2020
On Friday, 12 June 2020 at 00:58:34 UTC, duck_tape wrote:
> On Thursday, 11 June 2020 at 23:45:31 UTC, H. S. Teoh wrote:
>>
>> Hmm, looks like it's not so much input that's slow, but
>> *output*. In fact, it looks pretty bad, taking almost as much
>> time as overlap() does in total!
>>
>> [snip...]
>
> I'll play with that a bit tomorrow! I saw a nice implementation
> on eBay's tsvutils that I may need to look closer at.
>
> Someone else suggested that stdout flushes per line by default.
> I dug around the stdlib but could confirm that. I also played
> around with setvbuf but it didn't seem to change anything.
>
> Have you run into that before / know if stdout is flushing
> every newline? I'm not above opening '/dev/stdout' as a file of
> that writes faster.
I put some comparative benchmarks in
https://github.com/jondegenhardt/dcat-perf. It compares input
and output using standard Phobos facilities (File.byLine,
File.write), iopipe (https://github.com/schveiguy/iopipe), and
the tsv-utils buffered input and buffered output facilities.
I haven't spent much time on results presentation, I know it's
not that easy to read and interpret the results. Brief summary -
On files with short lines buffering will result in dramatic
throughput improvements over the standard phobos facilities. This
is true for both input and output, through likely for different
reasons. For input iopipe is the fastest available. tsv-utils
buffered facilities are materially faster than phobos for both
input and output, but not as fast as iopipe for input. Combining
iopipe for input with tsv-utils BufferOutputRange for output
works pretty well.
For files with long lines both iopipe and tsv-utils
BufferedByLine are materially faster than Phobos File.byLine when
reading. For writing there wasn't much difference from Phobos
File.write.
A note on File.byLine - I've had many opportunities to compare
Phobos File.byLine to facilities in other programming languages,
and it is not bad at all. But it is beatable.
About Memory Mapped Files - The benchmarks don't include compare
against mmfile. They certainly make sense as a comparison point.
--Jon
More information about the Digitalmars-d-learn
mailing list