Recommendations on avoiding range pipeline type hell

Sun May 16 13:35:02 UTC 2021

On Sunday, 16 May 2021 at 12:54:19 UTC, Chris Piker wrote:
>    a = b; // Lambdas as arguments instead of types works

Wait a sec, when you do the

```d
auto a = S!(a => a*2)();
```

That's not actually passing a type. That's passing the (hidden) 
name of a on-the-spot-created function template as a compile time 
parameter. If it was just a type, you'd probably be ok!

It is that on-the-spot-created bit that trips up. The compiler 
doesn't both looking at the content of the lambda to see if it is 
repeated from something earlier. It just sees that shorthand 
syntax and blindly expands it to:

```
static int __some_internal_name_created_on_line_5(int a) {
    return a*2;
}
```

(even that isn't entirely true, because you didn't specify the 
type of `a` in your example, meaning the compiler actually passes

```
template __some_internal_name_created_on_line_5(typeof_a) {
    auto __some_internal_name_created_on_line_5(typeof_a a) {
       return a*2;
    }
}
```

and that *template* is instantiated with the type the range 
passes to the lambda - inside the range's implementation - to 
create the *actual* function that it ends up calling.

but that's not really important to the point since you'd get the 
same thing even if you did specify the types in this situation.)

Anyway, when you repeat it later, it makes *another*:

```
static int __some_internal_name_created_on_line_8(int a) {
    return a*2;
}
```

And passes that. Wanna know what's really nuts? If it is made in 
the context of another template, even being on the same line 
won't save you from duplicates. It creates a new copy of the 
lambda for each and every distinct context it sees. Same thing in 
a different object? Another function. Different line? Another 
function. Different template argument in the surrounding 
function? Yet another function.

In my day job thing at one point one little `(a,b) => a < b` 
sorting lambda exploded to *two gigabytes* of generated identical 
functions in the compiler's memory, and over 100 MB in the 
generated object files. simply moving that out to a top-level 
function eliminated all that bloat... most of us could barely 
believe such a little thing had such a profound impact.

It would be nice if the compiler could collapse those duplicates 
by itself, wouldn't it? But...

void main() {
         auto a = (int arg) => arg + 1;
         auto b = (int arg) => arg + 1;

         assert(a is b);
}

Should that assert pass? are those two actually the same 
function? Right now it does NOT, but should it? That's a question 
for philosophers and theologians, way above my pay grade.

Then a practical worry, how does the compiler tell if two lambdas 
are actually identical? There's a surprising number of cases that 
look obvious to us, but aren't actually. Suppose it refers to a 
different local variable. Or ends up with a different type of 
argument. Or what if they came from separate compilation units? 
It is legitimately more complex than it seems at first glance.

I digress again... where was I?

Oh yeah, since it is passing an alias to the function to the 
range instead of the type, the fact that they're considered 
distinct entities - even if just because the implementation is 
lazy and considers that one was created on line 5 and one was 
created on line 8 to be an irreconcilable difference - means the 
range based on that alias now has its own distinct type.

Indeed, passing the lambda as a runtime arg fixes this to some 
extent since at least then the type match up. But there's still a 
bit of generated code bloat (not NEARLY as much! but still a bit).

For best results, declare your own function as close to top-level 
as you can with as many discrete types as you can, and give it a 
name. Don't expect the compiler to automatically factor things 
out for you. (for now, i still kinda hope the implementation can 
improve someday.)

This is obviously more of a hassle. Even with a runtime param you 
have to specify more than just `a => a*2`... minimally like `(int 
a) => a*2`.

> ```d
> struct S(alias Func)
> {
>    pragma(msg, __traits(identifier, Func));
> }
>
> int func1(int a){ return a*2; }
>
> int func2(int a){ return a*2; }
>
> void main()
> {
>    auto a = S!func1();
>    auto b = S!func2();
>
>    pragma(msg, typeof(a));
>    pragma(msg, typeof(b));
>    a = b;
> }
>
> ```
> I'm going to go above my station and call this a bug in 
> typeof/typeid.

Wait, what's the bug there? The typeof DOES tell you they are 
separate.

Error: cannot implicitly convert expression `b` of type 
`S!(func2)` to `S!(func1)`

Just remember it isn't the function name per se, it is the symbol 
`alias` that S is taking. Which means it is different for each 
symbol passed... The alias in the parameter list tells you it is 
making a new type for each param. Same as if you did

struct S(int a) {}

S!1 would be a distinct type from S!2.