top time wasters in DMD, as reported by gprof - VS2010/VTune results

Tue Jun 25 02:27:40 PDT 2013

On Tuesday, 25 June 2013 at 06:21:09 UTC, dennis luehring wrote:
> Am 25.06.2013 07:51, schrieb dennis luehring:
>> Am 24.06.2013 18:15, schrieb Richard Webb:
>>> DMD built with DMC takes ~49 seconds to complete, but DMD 
>>> build
>>> with VC2008 only takes ~12 seconds. (Need to get a proper VC
>>> build done to test it properly).
>>> Looks like the DMC build spends far more time allocating 
>>> memory,
>>> even though the peak memory usage is only slightly lower in 
>>> the
>>> VS version?
>>
>> i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the 
>> attached
>> zipped csv file
>>
>>
>
> the AMD CodeXL results are also different - both  VTune and 
> CodeXL fully integrated into VS2010 and using "same" settings
>
> btw nice to read: 
> http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html

GProf tends to be pretty useless for actual profiling in my 
experience.

I think the best way is to use a sampling profiler such as 'perf' 
(a part of the linux project on a recent debian/ubuntu/mint type 
'perf' into console to get info about what package to install, 
docs at
https://perf.wiki.kernel.org/index.php/Tutorial‎,
'oprofile' (pretty much the same featureset as perf, sometimes 
hard to set up) or VTune mentioned here. Never expect gprof to 
give you reliable data as to how much time which function takes. 
Callgrind/kcachegrind is also pretty good if your code doesn't 
spend a lot of time on i/o, system calls, etc (as the main code 
is running in a slow VM - anything not running in that VM will 
seem to run much faster).

Furthermore, _neither_ of these requires compiling with special 
flags. As for debug symbols, it's best to enable optimizations 
together with enabling debug symbols. Optimizations are not a big 
issue - even if some functions were inlined, these tools give you 
per-line and per-instruction results. Not to mention cache 
hits/misses, branches, branch mispredictions, and if you use CPU 
specific event IDs whatever else your CPU can record. AND it 
doesn't affect performance of profiled code measurably, unless 
you set an insanely high sample rate.

And if this sounds difficult to configure, most of these tools 
(perf at the very least) have very sane defaults that give way 
more useful results than gprof.

TLDR: gprof is horrible. Never use it for profiling. There are 
approximaly 5 billion better tools that give more detailed 
results _and_ are easier to use.

I seriously need to write a blog post/article about this.