Looking for a Code Review of a Bioinformatics POC

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Jun 12 22:11:53 UTC 2020


On Fri, Jun 12, 2020 at 12:11:44PM +0000, duck_tape via Digitalmars-d-learn wrote:
> On Friday, 12 June 2020 at 12:02:19 UTC, duck_tape wrote:
> > For speedups with getting my hands dirty:
> > - Does writef and company flush on every line? I still haven't found
> > the source of this.

writef, et al, ultimately goes through LockingTextWriter in
std.stdio.File:

https://github.com/dlang/phobos/blob/master/std/stdio.d#L2890

Looks like it's doing some Unicode manipulation and writing character by
character -- a pretty slow proposition IMO!  It was done this way for
Unicode-correctness, AFAICT, but if you already know the final form your
output is going to take, directly calling fwrite(), or the D wrapper
File.rawWrite(), will probably give you a significant performance boost.


> > - It looks like I could use {f}printf if I really wanted to:
> > https://forum.dlang.org/post/hzcjbanvkxgohkbvjnkv@forum.dlang.org

Be aware that D strings, other than string literals, are generally NOT
null-terminated, so you need to call toStringZ before calling fprintf,
otherwise you might be in for a nasty surprise. :-P  Other than that,
calling C from D is pretty easy:

	extern(C) int printf(char*, ...);

	void myDCode(string data) {
		printf("%s\n", data.toStringZ); // calls C printf
	}


> On Friday, 12 June 2020 at 12:02:19 UTC, duck_tape wrote:
> 
> Switching to using `core.stdc.stdio.printf` shaved off nearly two
> seconds (11->9)!
> 
> Once I wrap this up for submission to biofast I will play with mem
> memmapping / iopipe / tsvutils buffered writers. Sambamba is also
> doing some non-standard tweaks to it's outputting as well.
> 
> I'm still convinced that stdout is flushing by line.

It seems likely, if you're outputting to terminal. Otherwise, it's
likely the performance slowdown is caused by Unicode manipulation code
inside LockingTextWriter.

On that note, somebody should get to the bottom of this, and submit a PR
to Phobos with a fast-track path for the (IMO very common) case where
the string can just be fwrite'd straight into output. AFAICT, all the
extra baggage currently in LockingTextWriter is mainly to deal with the
case where the OS expects a (slightly) different encoding for text than
is internally represented, e.g., classic 0x0D 0x0A DOS line endings
(which I heard are obsolete these days, so even that case may not be as
common as it used to be anymore), or outputting UTF-16 to UTF-8 or vice
versa.  I'm skeptical whether this is the common case these days, so
having a fast path for UTF-8 -> UTF-8 (i.e., just fwrite the whole
thing straight to file) will be a good improvement for D.


T

-- 
Nobody is perfect.  I am Nobody. -- pepoluan, GKC forum


More information about the Digitalmars-d-learn mailing list