String Switch Lowering

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Jan 25 18:41:55 UTC 2018


On Thu, Jan 25, 2018 at 07:21:29PM +0100, Benjamin Thaut via Digitalmars-d wrote:
> _D6object__T8__switchTyaVxAyaa7_[...snip ridiculously long symbol...]
> 
> The first time I encountered this symbol in phobos I though: WTF? Then
> I tried to demangle it:
> core.exception.RangeError at src\core\demangle.d(230): Range violation

LOL! This reminds me of the days before Rainer's symbol backreferencing
PR was merged, where a UFCS chain of range algorithms causes exponential
growth in symbol length. This one isn't exponential, but it *is* still
ridiculously long.  We need to fix it. :-D


> I was then quickly informed by Rainer Scheutze what the correct
> demangling for this symbols is:
> 
> pure nothrow @nogc @safe int object.__switch!(immutable(char), "CST6CDT",
> "EST5EDT", "Etc/GMT", "MST7MDT", "PST8PDT", "Asia/Aden", "Asia/Baku",
[... snip ridiculously long template argument list ...]
> "America/Argentina/Rio_Gallegos",
> "America/North_Dakota/New_Salem").__switch(scope const(immutable(char)[]))
> 
> So I was thinking to myself: Is it really a good idea to lower string
> switches to a template if it results in such symbols? This symbol
> alone takes 17815 Bytes.

I think this is part of a much larger issue that we need to tackle, that
is, long template argument lists (esp. since D allows you to directly
manipulate these lists aka tuples aka AliasSeq, so user code is liable
to generate large numbers of these things with potentially very long
lengths).

I don't have a clear solution yet, but my initial thought is that in
cases like these, where the list is basically unique, all that's
*really* required of the generated symbol is that it be unique. There is
really no need to go encoding every last detail into the symbol name, as
if the first 1000 bytes or so of the symbol isn't probably already
enough to disambiguate it from every other symbol in the program.  If we
could somehow detect or annotate these cases as merely requiring a
unique symbol, then we could just substitute the whole monstrous thing
with a hash, like an MD5 or SHA checksum, which will be much less than
100 bytes.


> If we think this is a good idea, should we rewrite this particular
> string switch to use a associative array instead to avoid the overly
> long symbol name?
[...]

I believe the original idea behind using a template for string switches
was to allow the possibility for the implementation to be smarter about
how to implement the switch (IIRC, string switches used to be
implemented as a runtime function). Supposedly object.__switch could
analyze the list of strings at compile-time and generate a perfect hash
or something, to maximize runtime performance.

IMO the real fix ought to be to make the compiler somehow recognize
these cases and generate shorter symbols for them, rather than
hard-coding the Phobos code to use AAs, though admittedly, the latter
may probably a necessary stop-gap measure in the meantime.

(On which note, I wonder if you may have inadvertently found the source
of my recent dmd memory usage woes... a symbol like this in a commonly
imported module in Phobos like std.datetime would explain why recently I
suddenly can't compile Phobos anymore on a low-memory system without
invoking the kernel OOM killer, or why even the most trivial of projects
take ridiculous amounts of memory to compile.)


T

-- 
The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a. -- Wouter Verhelst


More information about the Digitalmars-d mailing list