[hackathon] My and Walter's ideas

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Sat Apr 25 20:40:44 PDT 2015


I've been on this project at work that took the "functionality first, 
performance later" approach. It has a Java-style approach of using class 
objects throughout and allocating objects casually.

So now we have a project that works but is kinda slow. Profiling shows 
it spends a fair amount of time collecting garbage (which is easily 
visible by just looking at code). Yet there is no tooling that tells 
where most allocations happen.

Since it's trivial to make D applications a lot faster by avoiding big 
ticket allocations and leave only the peanuts for the heap, there should 
be a simple tool to e.g. count how many objects of each type were 
allocated at the end of a run. This is the kind of tool that should be 
embarrassingly easy to turn on and use to draw great insights about the 
allocation behavior of any application.

First shot is a really simple proof of concept at 
http://dpaste.dzfl.pl/8baf3a2c4a38. I used manually replaced all "new 
T(args)" with "make!T(args)" and all "new T[n]" with "makeArray!T(n)". I 
didn't even worry about concatenations and array literals in the first 
approximation.

The support code collects in a thread-local table the locus of each 
allocation (file, line, and function of the caller) alongside with the 
type created. Total bytes allocated for each locus are tallied.

When a thread exits, it's table is dumped wholesale into a global table, 
which is synchronized. It's fine to use a global lock because the global 
table is only updated when a thread exits, not with each increment.

When the process exits, the global table is printed out.

This was extraordinarily informative essentially taking us from "well 
let's grep for new and reduce those, and replace class with struct where 
sensible" to a much more focused approach that targeted the top 
allocation sites. The distribution is Pareto, e.g. the locus with most 
allocations accounts for four times more bytes than the second, and the 
top few are responsible for statistically all allocations that matter. 
I'll post some sample output soon.

Walter will help me with hooking places that allocate in the runtime 
(new operator, catenations, array literals etc) to allow building this 
into druntime. At the end we'll write an article about this all.


Andrei


More information about the Digitalmars-d mailing list