Code speed (and back to the memory leaks...)

Wed Apr 14 14:19:53 PDT 2010

Hi Bearophile,

Thanks ever so much for the amount of effort you put into this -- it is
extremely kind and generous. :-)

I'm not seeing the significant gains that you see with DMD, but for me
too LDC doubles the speed.

My guess for the difference is that you ran less than the full 100
iterations of the main for loop.  As you can see in the code the Yzlm
class launches an iterative procedure which takes more or less
iterations to converge depending on initial conditions.

This _normally_ is relatively few, but can in some circumstances be
large (several hundred).

An effect of changing the random number generator is that it changes the
initial conditions presented to the algorithm.  For example, in the
first 10 runs of the loop, the original code performs 404 iterations
overall and your new code performs only 348.

Combine that with other savings like the removal of the appender and the
faster random number generation and you get a fairly sizeable saving --
about a 25-30% cut in running time when compiling with dmd.  But those
savings become less of an influence on running time as you increase the
number of simulations.

The _real_ measure of speed is thus not overall program running time but
running time relative to the total number of iterations.  The good news
is that while the LDC-compiled version doesn't beat g++ it comes pretty
close, with about 32 iterations per second for the D code compared to 26
for the C++. :-)

The C++ code can probably be optimised a bit further but not too much;
the algorithms in the iterative process are pretty much identical
between the two versions.  (The C++ doesn't have the 'clever' memory
swap-around that I put in the D code, but that doesn't make any
difference to D's performance anyway...)  So I think it's probably just
compiler difference that's to blame for speed differences -- which is
fine.  After all, D in general and DMD in particular is in development.

I _am_ surprised that in your profiling the random number generator took
up so much time, as if you cut out the lines,

        aa(ratings, reputationUser, reputationObject);
        yzlm(ratings, reputationUser, reputationObject);

... and recompile, the program screams through in no time at all --
which would lead me to think that it's the yzlm opCall and the functions
it calls that take up the majority of time.  Or is there some lazy
evaluation going on here ... ?

I don't think it's the algorithm in any case (although it might be
Phobos' implementation) as working with Tango and LDC, changing between
the Twister or Kiss random number generators doesn't make for any
meaningful performance difference.

One last thing.  While comparing DMD and LDC I noticed something odd.
Take this code, which is an LDC and DMD-compatible version of the 'array
leak' code I posted originally:

/////////////////////////////////////////////////////////////////////////
version(Tango) {
    import tango.stdc.stdio: printf;
} else {
    import std.stdio: printf;
}

void main()
{
    double[] x;

    for(uint i=0;i<100;++i) {
        x.length = 0;

        for(uint j=0;j<5_000;++j) {
            for(uint k=0;k<1_000;++k) {
                x ~= j*k;
            }
        }

        printf("At iteration %u, x has %u elements.\n",i,x.length);
    }
}
/////////////////////////////////////////////////////////////////////////

I noticed that when compiled with LDC the memory usage stays constant,
but in DMD the memory use blows up.  Is this a D1/D2 difference (with D2
having the more advanced features that require preservation of memory)
or is it a bug in one or the other of the compilers ... ?

Thanks & best wishes,

    -- Joe