Looking for a Code Review of a Bioinformatics POC

duck_tape sstadick at gmail.com
Fri Jun 12 12:02:19 UTC 2020


On Friday, 12 June 2020 at 07:25:09 UTC, Jon Degenhardt wrote:
> tsv-utils has the advantage of only needing to support utf-8 
> files with Unix newlines, so the code is simpler. (Windows 
> newlines are detected, this occurs separately from 
> bufferedByLine.) But as you describe, support for a wider 
> variety of input cases could be done without sacrificing basic 
> performance. iopipe provides much more generic support, and it 
> is quite fast.

I will have to look into iopipe for sure. All this info is great. 
For this particular benchmark the goal is just to show off some 
'high-level' languages and how close to c they can get. If I can 
avoid going way into the weeds writing my own output methods, 
that's more in the spirit of things.

However, I do intend to be using D for bioinformatics, which is 
incredibly IO intensive, so much of this will be put to good use.

For speedups with getting my hands dirty:
- Does writef and company flush on every line? I still haven't 
found the source of this.
- It looks like I could use {f}printf if I really wanted to: 
https://forum.dlang.org/post/hzcjbanvkxgohkbvjnkv@forum.dlang.org


It's particularly interesting what is said about short lines 
doing worse, because these are pretty short, less than 20 
characters usually.


More information about the Digitalmars-d-learn mailing list