Compilation times and idiomatic D code

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Wed Jul 5 13:12:40 PDT 2017


Over time, what is considered "idiomatic D" has changed, and nowadays it
seems to be leaning heavily towards range-based code with UFCS chains
using std.algorithm and similar reusable pieces of code.

D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase
with only ~5000 lines of code, a good part of which are comments, takes
about 11 seconds to compile.

A first hint is that these meager 5000 lines of code compile to a 600MB
executable. Well, large executables have been the plague of D since the
beginning, but the reasoning has always been that hello world examples
don't really count, because the language offers the machinery for much
more than that, and the idea is that as the code size grows, the "bloat
to functionality" ratio decreases.  But still... 600MB for 5000 lines of
code seems a bit excessive. Especially when stripping symbols cut off
about *half* of that size.

Which leads to the discovery, to my horror, that there are some very,
VERY large symbols that are generated. Including one that's 388881
characters long. Yes, that's almost 400KB just for ONE symbol.  This
particular symbol is the result of a long UFCS chain in the main
program, and contains a lot of repeated elements, like
myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments
and so on.  Each additional member in the UFCS chain causes a repetition
of all the previous members' return type names, plus the new typename,
causing an O(n^2) explosion in symbol size.

Worse yet, because the typename encoded in this monster symbol is a
range, you have the same 300+KB of typename repeated for each of the
range primitives. And anything else this typename happens to be a
template argument to.  There's another related symbol that's 388944
characters long.  Not to mention all the range primitives (along with
their similarly huge typenames) of all the smaller types contained
within this monster typename.

Given this, it's no surprise that the compiler took 11 seconds to
compile a 5000-line program. Just imagine how much time is spent
generating these huge symbols, storing them in the symbol table,
comparing them in symbol table lookups, writing them to the executable,
etc..  And we're not even talking about the other smaller, but still
huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB
symbols, etc..  And think about the impact of this on the compiler's
memory footprint.

IOW, the very range-based idiom that has become one of the defining
characteristics of modern D is negating the selling point of fast
compilation.

I vaguely remember there was talk about compressing symbols when they
get too long... is there any hope of seeing this realized in the near
future?


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune


More information about the Digitalmars-d mailing list