Compilation times and idiomatic D code

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Fri Jul 14 16:30:18 PDT 2017


On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]
> Here's a further update to the saga of combating ridiculously large
> symbol sizes.
[...]
> 	     .wrapWithInterface	// <--- type erasure happens here
[...]

Some further thoughts about type erasure and UFCS chains.

Nobody says type erasure *requires* OO -- that's just an artifact of the
way things are currently implemented.  Consider, for example, your
generic UFCS chain:

	auto r = source
		.ufcsFunc1
		.ufcsFunc2
		.ufcsFunc3
		...
		.ufcsFuncn;

>From the POV of outside code, *nobody cares* what the specific types of
each stage of the UFCS pipeline are.  Only the code that implements
ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the
specific type of the range that's returned by ufcsFunc3, in the middle
of the pipeline.  What are the chances this exact type is going to be
reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias
parameter that's instantiated with a lambda function, you can basically
guarantee that this type X will *never* ever be repeated again, outside
of this specific UFCS chain.

And at the end of the chain, typeof(r) really only needs to encode the
fact that it implements the API returned by ufcsFuncn, e.g., the range
methods, element type, etc.. But nobody will care how these API
functions are implemented under the hood -- as far as outside code is
concerned, typeof(r) might as well be an opaque serial number, and we'll
be able to find the instantiated API functions that implement the
methods of r.  IOW, the explicit types of all intermediate stages of the
UFCS chain are irrelevant to the outside world.  Since they are template
functions (most of the time) anyway, the compiler might as well just
glue all their code together, and put it inside functions named like
funcName0012345XYZ where the 0012345XYZ is some random unique string,
basically a serial number, that identifies it as a method of r.

For that matter, typeof(r) might as well be ufcsChain0012345XYZ, since
nobody outside the pipeline itself will care what its component types
are.

So basically, the compiler could just automatically type-erase the type
of such chains, given that (1) none of the UFCS functions "escape"
knowledge about an intermediate type to the outside world, (2) the chain
is essentially unique to the current function (esp. if one or more
components of the chain has an alias parameter bound to a lambda
argument), (3) most UFCS functions return opaque types anyway -- very
few (i.e., almost none) Phobos algorithms, for example, allow you to
recover the type of the previous component of the chain -- so it's not
relevant what the specific types of the previous components are, and (4)
therefore the outside world couldn't care less about the specific types
that compose typeof(r).  It could be as simple as bumping a global
counter and suffixing it to some compiler-reserved prefix, to make type
names like __ufcsType_00000123.  This will be far shorter than any
symbol compression applied to present-day symbols!

To summarize, here's a scheme for handling template expansion in UFCS
chains that will produce extremely short symbols, *independently* of how
long your UFCS chains are.  If the compiler sees a UFCS chain with these
characteristics:

1) There are ≥ n components, for some arbitrary value of n (say n=5 or
n=10, or something like that, so that we don't bother with trivial cases
where the full type may actually be important);

2) Each component of the chain is a template function (this one is
almost a given);

3) At least one component guarantees a unique instantiation, e.g., an
alias parameter bound to a lambda function;

then the compiler will:

a) Generate a unique serial number (package.module + global counter
should be good enough);

b) Assign a unique typename to the final return value, constructed using
this serial number;

c) Essentially inline all template bodies of previous stages of the
pipeline into the last one, using the same serial number to generate
names for the aggregated symbols of the final stage.

This way, those ridiculously huge symbols aren't even generated to begin
with, thereby alleviating the need for convoluted compression schemes.


T

-- 
I am not young enough to know everything. -- Oscar Wilde


More information about the Digitalmars-d mailing list