How to tune numerical D? (matrix multiplication is faster in g++ vs gdc)

Mon Mar 4 09:00:06 PST 2013

On Monday, 4 March 2013 at 15:44:40 UTC, bearophile wrote:
> John Colvin:
>
>> The performance of the multiplication loops and the 
>> performance of the allocation are separate issues and should 
>> be measured as such, especially if one wants to make 
>> meaningful optimisations.
>
> If you want to improve the D compiler, druntime, etc, then I 
> agree you have to separate the variables and test them one at a 
> time. But if you are comparing languages+runtimes+libraries 
> then it's better to not cheat, and test the whole running 
> (warmed) time.
>
> Bye,
> bearophile

I disagree. Information about which parts of the code are running 
fast and which are running slow is critical to optimisation. If 
you don't know whether it's the D memory allocation that's slow 
or the D multiplication loops, you're trying to optimise 
blindfolded.

Even if all your doing is a comparison, it's a lot more useful to 
know *where* the slowdown is happening so that you can make a 
meaningful analysis of the results.

Enter a strange example:
I found that malloced multi-dim arrays were considerably faster 
to iterate over and assign to than D gc slices, even with the gc 
disable after allocation and bounds checks turned off.

If I hadn't bothered to do separate timings of the allocation and 
iteration, I would never have noticed this effect and instead 
written it off as purely "malloc is faster at allocating than the 
GC"