Compiler performance with my ridiculous Binderoo code
Ethan Watson via Digitalmars-d
digitalmars-d at puremagic.com
Sun Dec 11 08:26:29 PST 2016
I've been keeping in contact with Stefan and providing him
example code to test with his CTFE engine. He's been saying for a
while that templates are slow. So I decided to finally work out
just how slow we're talking about here.
I can't show the exact code I'm running with, but needless to say
this particular test case crashes the 32-bit dmd.exe that comes
with the official downloads. I've had to build my own 64-bit
version... which also eventually crashes but only after consuming
8 gigabytes of memory.
Using Visual Studio 2015's built in sample-based profiler, I
decided to see just what the compiler was doing on a release
build with the problem code.
http://pastebin.com/dcwwCp28
This is a copy of the calltree where DMD spends most of its time.
If you don't know how to read these profiles, the good one to
look at is that it's 130+ functions deep in the callstack. Plenty
of template instances, plenty of static if, plenty of static
foreach... I'm doing quite a bit to tax the compiler in other
words.
Which got me thinking. I've been rolling things over to
CTFE-generated string mixins lately in anticipation for the
speedups Stefan will get us. But there's one bit of template code
that I have not touched at all.
https://github.com/Remedy-Entertainment/binderoo/blob/master/binderoo_client/d/src/binderoo/binding/serialise.d
This is a simple set of templated functions that parses objects
and serialises them to JSON (the reason I'm not just using
std.json is because I need custom handling for pointers and
reference types). But it turns out this is the killer. As a part
of binding an object for Binderoo's rapid iteration purposes, it
generates a serialise/deserialise call that instantiates for each
type found. If I turn that code generation off, the code
compiles. If I remove a file that has 1000+ structs
(auto-generated) with tree-like instances embedded in the only
object I apply Binderoo's binding to in that entire module, it
compiles in 45% of the time (12 seconds versus 26 seconds).
The hot path without that 1000+ struct file actually goes through
the AttribDeclaration.semantic and
UserAttributeDeclaration.semantic code path, with the OS itself
doing the most work for a single function thanks to
Outbuffer::writeString needing to realloc string memory in
dmd\backend\outbuf.c.
The hot path with that 1000+ struct file spends the most time in
TemplateInstance.semantic, specifically with calls to
TemplateDeclaration.findExistingInstance,
TemplateInstance.tryExpandMembers, and
TemplateInstance.findBestMatch taking up 90%+ of its time.
finExistingInstance spends most of its time in arrayObjectMatch
in dtemplate.d, which subseuqently spends most of its time in the
match function in the same file (which calls virtuals on
RootObject to do comparisons).
At the very least, I now have an idea of which parts of the
compiler I'm taxing and can attempt to write around that. But I'm
also tempted to go in and optimise those parts of the compiler.
More information about the Digitalmars-d
mailing list