d word counting approach performs well but has higher mem usage

Jon Degenhardt jond at noreply.com
Sun Nov 4 20:24:02 UTC 2018


On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote:
> Hi there,
>
> the task is simple: count word occurrences from stdin (around 
> 150mb in this case) and print sorted results to stdout in a 
> somewhat idiomatic fashion.
>
> Now, d is quite elegant while maintaining high performance 
> compared to both c and c++, but I, as a complete beginner, 
> can't identify where the 10x memory usage (~300mb, see results 
> below) is coming from.
>
> Unicode overhead? Internal buffer? Is something slurping the 
> whole file? Assoc array allocations? Couldn't find huge allocs 
> with dmd -vgc and -profile=gc either. What did I do wrong?

Not exactly the same problem, but there is relevant discussion in 
the blog post I wrote a while ago:  
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

See in particular the section on Associate Array lookup 
optimization. This takes advantage of the fact that it's only 
necessary to create the immutable string the first time a key is 
entered into the hash. Subsequent occurrences do not need to take 
this step. As creating allocates new memory, even if only used 
temporarily, this is a meaningful savings.

There have been additional APIs added to the AA interface since I 
wrote the blog post, I believe it is now possible to accomplish 
the same thing with more succinct code.

Other optimization possibilities:
* Avoid auto-decode: Not sure if your code is hitting this, but 
if so it's a significant performance hit. Unfortunately, it's not 
always obvious when this is happening. The task your are 
performing doesn't need auto-decode because it is splitting on 
single-byte utf-8 char boundaries (newline and space).

* LTO on druntime/phobos: This is easy and will have a material 
speedup. Simply add
         '-defaultlib=phobos2-ldc-lto,druntime-ldc-lto'
to the 'ldc2' build line, after the '-flto=full' entry. This will 
be a win because it will enable a number of optimizations in the 
internal loop.

* Reading the whole file vs line by line - 'byLine' is really 
fast. It's also nice and general, as it allows reading arbitrary 
size files or standard input without changes to the code. 
However, it's not as fast as reading the file in a single shot.

* std.algorithm.joiner - Has improved dramatically, but is still 
slower than a foreach loop. See: 
https://github.com/dlang/phobos/pull/6492

--Jon




More information about the Digitalmars-d-learn mailing list