Compiler performance with my ridiculous Binderoo code

Ethan Watson via Digitalmars-d digitalmars-d at puremagic.com
Sun Dec 11 08:26:29 PST 2016


I've been keeping in contact with Stefan and providing him 
example code to test with his CTFE engine. He's been saying for a 
while that templates are slow. So I decided to finally work out 
just how slow we're talking about here.

I can't show the exact code I'm running with, but needless to say 
this particular test case crashes the 32-bit dmd.exe that comes 
with the official downloads. I've had to build my own 64-bit 
version... which also eventually crashes but only after consuming 
8 gigabytes of memory.

Using Visual Studio 2015's built in sample-based profiler, I 
decided to see just what the compiler was doing on a release 
build with the problem code.

http://pastebin.com/dcwwCp28

This is a copy of the calltree where DMD spends most of its time. 
If you don't know how to read these profiles, the good one to 
look at is that it's 130+ functions deep in the callstack. Plenty 
of template instances, plenty of static if, plenty of static 
foreach... I'm doing quite a bit to tax the compiler in other 
words.

Which got me thinking. I've been rolling things over to 
CTFE-generated string mixins lately in anticipation for the 
speedups Stefan will get us. But there's one bit of template code 
that I have not touched at all.

https://github.com/Remedy-Entertainment/binderoo/blob/master/binderoo_client/d/src/binderoo/binding/serialise.d

This is a simple set of templated functions that parses objects 
and serialises them to JSON (the reason I'm not just using 
std.json is because I need custom handling for pointers and 
reference types). But it turns out this is the killer. As a part 
of binding an object for Binderoo's rapid iteration purposes, it 
generates a serialise/deserialise call that instantiates for each 
type found. If I turn that code generation off, the code 
compiles. If I remove a file that has 1000+ structs 
(auto-generated) with tree-like instances embedded in the only 
object I apply Binderoo's binding to in that entire module, it 
compiles in 45% of the time (12 seconds versus 26 seconds).

The hot path without that 1000+ struct file actually goes through 
the AttribDeclaration.semantic and 
UserAttributeDeclaration.semantic code path, with the OS itself 
doing the most work for a single function thanks to 
Outbuffer::writeString needing to realloc string memory in 
dmd\backend\outbuf.c.

The hot path with that 1000+ struct file spends the most time in 
TemplateInstance.semantic, specifically with calls to 
TemplateDeclaration.findExistingInstance, 
TemplateInstance.tryExpandMembers, and 
TemplateInstance.findBestMatch taking up 90%+ of its time. 
finExistingInstance spends most of its time in arrayObjectMatch 
in dtemplate.d, which subseuqently spends most of its time in the 
match function in the same file (which calls virtuals on 
RootObject to do comparisons).

At the very least, I now have an idea of which parts of the 
compiler I'm taxing and can attempt to write around that. But I'm 
also tempted to go in and optimise those parts of the compiler.


More information about the Digitalmars-d mailing list