Compilation times and idiomatic D code

Sat Jul 15 08:58:12 PDT 2017

On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote:
> - What type information are being kept because of UFCS chains?
> Doesn't that mechanism simply apply overload resolution then
> choose between the prefix and .method forms as appropriate,
> rewriting the terms?
>      Then it's a problem of function invocation. I don't get
> what's happening here still. Does this tie to the Voldemort types
> problem? (=> are many of the functions in the chain returning
> custom types?) It doesn't make sense, especially if, from your
> indirection workaround, it looks like it would work around the
> same but without the bloat problem if we unlinked the chain into
> many intermediate temporary parts. So how is this a type
> information issue?

UFCS is irrelevant to all of this. That just so happens to be how he's
calling the functions. You'd get the same issue if you called all of the
functions in a chain with the traditional function call syntax rather than
treating them as a bunch of member functions. The issue has to do with how
each invocation of a range-based function tends to result in a new template
instantiation, and it's common practice in D to chain a bunch of templated
function calls together. For instance, if you have

    int[] a;
    auto b = a.map!(a => a / 2)();
    pragma(msg, typeof(b));

then it prints out

    MapResult!(__lambda1, int[])

If you have

    int[] a;
    auto b = a.map!(a => a / 2)().map!(a => a)();
    pragma(msg, typeof(b));

then it prints out

    MapResult!(__lambda2, MapResult!(__lambda1, int[]))

If you have

    int[] a;
    auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)();
    pragma(msg, typeof(b));

then it prints out

    FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1, 
int[])))

The type is getting progressively longer and more complicated, because when
the function template is instantiated, it's instantiated with the type from
the previous function call, so it's wrapping the previous type, and you get
a new type that has the name of the type of its argument embedded in it.
It's like if you keep wrapping a container inside another container.

Array!int a;
Array!(Array!int) b;
Array!(Array!(Array!int)) c;

The first level or two may not be so bad, but pretty quickly, it gets quite
ridiculous. The fact that we use auto and/or the name of the template
parameter in range-based code completely hides the fact that the type we're
operating on has a very long and hideous name. So, for the most part, you
don't care, but you do get ridiculously long symbol names underneath the
hood, and the compiler and linker have to deal with them, and it becomes a
performance problem. And as bad as this example is, map doesn't actually use
Voldemort types. MapResult is declared outside of map as a private struct.
The extra information that gets encoded in the symbol name with a Voldemort
type makes them _much_ worse, though interestingly enough, the compiler
doesn't print all of that out. e.g.

    int[][] a;
    auto b = a.joiner([5]);
    pragma(msg, typeof(b));

just prints

Result

which doesn't look bad even though things are apparently much uglier
underneath the hood.

But the fact that working with ranges results in a whole bunch of chained
invocations of function templates tends to result in a much larger symbol
sizes than you'd get with code that wasn't so generic. And the fact that
Voldemort types have been pushed as a great way to encapsulate the range
types that result from function calls and really shouldn't be used by name
makes the situation that much worse. So, idiomatic D code currently results
in very large symbol names, and while in many cases, you don't notice, some
people (like H.S. Teoh) _are_ noticing, and it's becoming a problem. As I
understand it, Weka has quite a few problems with this. Fortunately, a fix
to condense symbol names should be along shortly, which will help the
situation considerably, but in general, we really need to look at how we can
make some of these common D idioms do better in terms of symbol size.

- Jonathan M Davis