top time wasters in DMD, as reported by gprof - VS2010/VTune results
Kiith-Sa
kiithsacmp at gmail.com
Tue Jun 25 02:27:40 PDT 2013
On Tuesday, 25 June 2013 at 06:21:09 UTC, dennis luehring wrote:
> Am 25.06.2013 07:51, schrieb dennis luehring:
>> Am 24.06.2013 18:15, schrieb Richard Webb:
>>> DMD built with DMC takes ~49 seconds to complete, but DMD
>>> build
>>> with VC2008 only takes ~12 seconds. (Need to get a proper VC
>>> build done to test it properly).
>>> Looks like the DMC build spends far more time allocating
>>> memory,
>>> even though the peak memory usage is only slightly lower in
>>> the
>>> VS version?
>>
>> i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the
>> attached
>> zipped csv file
>>
>>
>
> the AMD CodeXL results are also different - both VTune and
> CodeXL fully integrated into VS2010 and using "same" settings
>
> btw nice to read:
> http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html
GProf tends to be pretty useless for actual profiling in my
experience.
I think the best way is to use a sampling profiler such as 'perf'
(a part of the linux project on a recent debian/ubuntu/mint type
'perf' into console to get info about what package to install,
docs at
https://perf.wiki.kernel.org/index.php/Tutorial,
'oprofile' (pretty much the same featureset as perf, sometimes
hard to set up) or VTune mentioned here. Never expect gprof to
give you reliable data as to how much time which function takes.
Callgrind/kcachegrind is also pretty good if your code doesn't
spend a lot of time on i/o, system calls, etc (as the main code
is running in a slow VM - anything not running in that VM will
seem to run much faster).
Furthermore, _neither_ of these requires compiling with special
flags. As for debug symbols, it's best to enable optimizations
together with enabling debug symbols. Optimizations are not a big
issue - even if some functions were inlined, these tools give you
per-line and per-instruction results. Not to mention cache
hits/misses, branches, branch mispredictions, and if you use CPU
specific event IDs whatever else your CPU can record. AND it
doesn't affect performance of profiled code measurably, unless
you set an insanely high sample rate.
And if this sounds difficult to configure, most of these tools
(perf at the very least) have very sane defaults that give way
more useful results than gprof.
TLDR: gprof is horrible. Never use it for profiling. There are
approximaly 5 billion better tools that give more detailed
results _and_ are easier to use.
I seriously need to write a blog post/article about this.
More information about the Digitalmars-d
mailing list