D and i/o

Jon Degenhardt jond at noreply.com
Sun Nov 10 19:41:52 UTC 2019

On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
> Dear,
> In my field we are io bound thus I would like to have our tools 
> fast as I can read a file.
> Thus I started some dummy bench which count the number of lines.
> The result is compared to wc -l command. The line counting is 
> only a pretext to evaluate the io, this process can be switched 
> by any io processing. Thus we use much as possible the buffer 
> instead the byLine range. Moreover such range imply that the 
> buffer was read once before to be ready to process.
> https://github.com/bioinfornatics/test_io
> Ideally I would like to process a shared buffer through 
> multiple core and run a simd computation. But it is not yet 
> done.

You might also be interested in a similar I/O performance test I 
created: https://github.com/jondegenhardt/dcat-perf. This one is 
based on 'cat' (copy to standard output) rather than 'wc', as I'm 
interested in both input and output, but the general motivation 
is similar. I specifically wanted to compare native phobos 
facilities to those in iopipe and some phobos covers in 
tsv-utils. Most tests are by-line based, as I'm interested in 
record oriented operations, but chunk-based copying is included.

A general observation is that if lines are involved, it's 
important to measure performance of both short and long lines. 
This may even affect 'wc' when reading by chunk or memory mapped 
files, see H. S. Teoh's observations on 'wc' performance: 

As an aside - My preliminary conclusion is that phobos facilities 
are overall quite good (based on tsv-utils comparative 
performance benchmarks), but are non-optimal when short lines are 
involved. This is the case for both input and output. Both the 
tsv-utils covers and iopipe are better, with iopipe being the 
best for input, but appears to need some further work on the 
output side (or I don't know iopipe well enough). By 
"preliminary", I mean just that. There could certainly be 
mistakes or incomplete analysis in the tests.


More information about the Digitalmars-d mailing list