profiling issues

Kiith-Sa via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Sep 11 21:22:04 PDT 2014


On Friday, 12 September 2014 at 03:23:55 UTC, Vlad Levenfeld 
wrote:
> I've got a library I've been building up over a few projects, 
> and I've only ever run it under "debug" "unittest" and 
> "release" (with dub "buildOptions").
> Lately I've needed to control the performance more carefully, 
> but unfortunately trying to compile with dub --profile gives me 
> some strange errors:
>
> 1) A few lines in one of my modules are reported as 
> "unreachable" by dmd. The data they operate on are defined 
> entirely in code (i.e. not read as external input) so maybe 
> they're getting CTFE'd into oblivion?
> All I know is they're apparently reachable in non-profiled code 
> (and very essential to the business logic... but they're just 
> math functions, nothing crazy, one of the unreachable lines 
> computes the areas of some polygons, another sums the areas up).
>
> 2) The linker complains about undefined references to 
> std.exception.enforce being called from std.stdio.rawRead.
>
> 3) If I try to compile with "buildOptions":["profile"] instead 
> of dub --profile, then it compiles and links but then I 
> segfault on launch at gc_malloc.
>
> I also recall (but can't seem to find) something about 
> profiling not working with multithreaded code? Because almost 
> every encapsulated service in this library runs on its own 
> thread.
>
> And the code base (>15k LOC) isn't easily reduced, as any 
> remotely interesting main method I write pretty much pulls from 
> the entire library. I don't want to have to turn this whole 
> thing inside out. Its like 95% templates and inlining wreaks 
> havoc on the logic as well, but that's another problem for 
> another day...
>
> Does anyone else have these kinds of issues? Are there any 
> alternative methods of coarse-grained profiling (i.e., not 
> manually peppering timer calls into my code)? Whats with the 
> unreachable statements? Any hints on what I can try next to get 
> closer to a performance profile of my code?


Instrumenting 'conventional' profilers such as DMD's builtin 
profiler or gprof are pretty useless for getting reliable data as 
they distort the results. I recommend using a sampling profiler.


With sampling profilers you usually get profiling results down to 
source line or even instruction level and you don't need to 
recompile your binary (having debug symbols is needed for source 
lines, though). They also tend to be able to measure more than 
just time (e.g. cache misses for individual caches, branches 
_and_ branch mispredictions, FPU usage, etc, etc)


If you're on Linux, 'perf' is good (on Ubuntu/Mint, possibly 
other distros just type 'perf' into the console and it will tell 
you what package to install, usually it's 'linux-tools-common').

https://perf.wiki.kernel.org/index.php/Tutorial

It also has the awesome 'perf top' utility that allows you to 
profile in real-time, like 'top' but with functions instead of 
processes.

OProfile is good *if you can get it to run*, very similar in 
usage to perf but I almost always run into some issue.


AMD CodeXL is also decent and on both Linux and Windows, although 
on non-AMD CPUs it can only measure execution time (still very 
useful, down to instruction level).

RotateRight Zoom, Intel VTune should also be good, but both are 
commercial.




If you're writing a game or any other real-time interactive 
application and need to profile occasional lags, you might need a 
different approach
(but in this case you won't avoid manual instrumentation, 
although it's rather easy to use):

http://defenestrate.eu/2014/09/05/frame_based_game_profiling.html

https://github.com/kiith-sa/tharsis.prof


More information about the Digitalmars-d-learn mailing list