Compilation times and idiomatic D code

Mon Jul 17 09:00:07 PDT 2017

On 7/14/17 7:30 PM, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
> [...]
>> Here's a further update to the saga of combating ridiculously large
>> symbol sizes.
> [...]
>> 	     .wrapWithInterface	// <--- type erasure happens here
> [...]
> 
> Some further thoughts about type erasure and UFCS chains.
> 
> Nobody says type erasure *requires* OO -- that's just an artifact of the
> way things are currently implemented.  Consider, for example, your
> generic UFCS chain:
> 
> 	auto r = source
> 		.ufcsFunc1
> 		.ufcsFunc2
> 		.ufcsFunc3
> 		...
> 		.ufcsFuncn;
> 
>>From the POV of outside code, *nobody cares* what the specific types of
> each stage of the UFCS pipeline are.  Only the code that implements
> ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the
> specific type of the range that's returned by ufcsFunc3, in the middle
> of the pipeline.  What are the chances this exact type is going to be
> reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias
> parameter that's instantiated with a lambda function, you can basically
> guarantee that this type X will *never* ever be repeated again, outside
> of this specific UFCS chain.

I think this is slightly flawed thinking.

Types can certainly be repeated. Even long chains of UFCS functions may 
be repeated elsewhere. Therefore, you need to have these characteristics:

1. The compiler can generate the same symbol given the same compile-time 
parameters. Separate compilation dictates this is a necessity.
2. This CANNOT depend on position inside the module (i.e. no line 
numbers or global counters). It would be too surprising for it to be a 
linker error to reorder functions in a module.
3. The mangled symbol should be reversable back to the original symbol name.
4. The symbol should make sense to the viewer.

Note that 3 and 4 are more "nice to have" than "essential", as you can 
certainly compile, link, and run without ever having to print a symbol name.

Thinking about this, I've had a couple interesting thoughts. First, when 
this really matters (i.e. you get really long symbol names) is symbols 
defined inside template functions. I don't need to rehash this, as you 
all know this.

But the definitions also depend on the runtime parameters, which 
ironically are usually a repeat of the template parameters. Hence the 
huge bloat.

But what if we forgo that requirement of using the runtime parameters? 
That is:

// prototypical UFCS wrapper
auto foo(Args...)(Args args)
{
   static struct Result { ... }
   return Result(args);
}

Instead of Result being typed as (e.g.):
   foo!(int, string, bool)(int, string, bool).Result

it really is
   foo!(int, string, bool).Result

What happens? In essence, this is the "horcrux" solution that I came up 
with. However, there is one case where it doesn't work. And that is, 
when you have foo overloaded only on runtime parameters:

auto foo(T)(T t) { struct Result { ... } ... }
auto foo(T)(T t, string s) { struct Result { ... } ... }

If we define Result the way I suggest, then both have the same name 
(foo!(int).Result) even though they are different structs.

If you use horcrux on this right now, it actually is a bug, as only the 
first Result definition is considered 
(https://issues.dlang.org/show_bug.cgi?id=17653)

But I would propose, that we just make this an error. That is, you just 
have to rename the second one Result2, or something else.

1. This would break some code. Not much, but some rare examples. Most 
UFCS code has one definition of a function, or multiple definitions, but 
with different template parameters.

2. It would solve the exponential problem, as now only the template 
definition is considered, so the growth of the symbol is linear.

3. The savings from Rainer's patch still can be additive.

Thoughts?

-Steve