howto count lines - fast

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue May 30 17:22:28 PDT 2017


P.S. After I posted the code, I took a closer look at the disassembly
and found that gdc wasn't generating the best code for the parallel
foreach loop body.  I haven't fully traced the cause yet, but I did find
a simple optimization (arguably a micro-optimization): updating the
subtotal inside the inner loop is a bit inefficient, because the
subtotal is outside the loop body and, due to the way std.parallelism is
implemented, is passed by reference to the loop body. This caused gdc
not to enregister the subtotal, so incrementing it requires a cache
roundtrip at best, a full RAM roundtrip at worst.

So I introduced a local count variable for incrementing, and only assign
to the subtotal array at the end of the block:

-------snip-------
// real    0m0.240s
// user    0m1.045s
// sys     0m0.059s
size_t lineCount4(string filename)
{
    import std.algorithm.comparison : min;
    import std.algorithm.iteration : sum;
    import std.mmfile;
    import std.parallelism;

    auto f = new MmFile(filename);
    auto data = cast(ubyte[]) f[];
    size_t[] subtotals;

    enum blockSize = 16384;
    if (data.length == 0) return 0;
    subtotals.length = 1 + (data.length - 1) / blockSize;

    foreach (j, ref subtotal; parallel(subtotals))
    {
        size_t end = min(blockSize*(j+1), data.length);
        size_t c = 0;
        foreach (i; blockSize*j .. end)
        {
            if (data[i] == '\n') c++;
        }
        subtotal = c;
    }
    return subtotals.sum;
}
-------snip-------

A look at the disassembly showed that gdc has enregistered the subtotal,
and thereby able to eliminate a branch from the inner loop using a sete
instruction and an add for the if statement.  The measured performance
is only marginally better, though, and doesn't change the overall
performance in a significant way.  But I thought this might be a useful
tip for people who want to squeeze out the last drop of juice from their
CPUs. ;-)

Next, I'll have to look into why gdc didn't unroll the inner loop, like
I thought it would.  But I'm out of time now, so that will have to wait
for another day.


T

-- 
The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.


More information about the Digitalmars-d-learn mailing list