Recommendations on avoiding range pipeline type hell
Adam D. Ruppe
destructionator at gmail.com
Sun May 16 13:35:02 UTC 2021
On Sunday, 16 May 2021 at 12:54:19 UTC, Chris Piker wrote:
> a = b; // Lambdas as arguments instead of types works
Wait a sec, when you do the
```d
auto a = S!(a => a*2)();
```
That's not actually passing a type. That's passing the (hidden)
name of a on-the-spot-created function template as a compile time
parameter. If it was just a type, you'd probably be ok!
It is that on-the-spot-created bit that trips up. The compiler
doesn't both looking at the content of the lambda to see if it is
repeated from something earlier. It just sees that shorthand
syntax and blindly expands it to:
```
static int __some_internal_name_created_on_line_5(int a) {
return a*2;
}
```
(even that isn't entirely true, because you didn't specify the
type of `a` in your example, meaning the compiler actually passes
```
template __some_internal_name_created_on_line_5(typeof_a) {
auto __some_internal_name_created_on_line_5(typeof_a a) {
return a*2;
}
}
```
and that *template* is instantiated with the type the range
passes to the lambda - inside the range's implementation - to
create the *actual* function that it ends up calling.
but that's not really important to the point since you'd get the
same thing even if you did specify the types in this situation.)
Anyway, when you repeat it later, it makes *another*:
```
static int __some_internal_name_created_on_line_8(int a) {
return a*2;
}
```
And passes that. Wanna know what's really nuts? If it is made in
the context of another template, even being on the same line
won't save you from duplicates. It creates a new copy of the
lambda for each and every distinct context it sees. Same thing in
a different object? Another function. Different line? Another
function. Different template argument in the surrounding
function? Yet another function.
In my day job thing at one point one little `(a,b) => a < b`
sorting lambda exploded to *two gigabytes* of generated identical
functions in the compiler's memory, and over 100 MB in the
generated object files. simply moving that out to a top-level
function eliminated all that bloat... most of us could barely
believe such a little thing had such a profound impact.
It would be nice if the compiler could collapse those duplicates
by itself, wouldn't it? But...
void main() {
auto a = (int arg) => arg + 1;
auto b = (int arg) => arg + 1;
assert(a is b);
}
Should that assert pass? are those two actually the same
function? Right now it does NOT, but should it? That's a question
for philosophers and theologians, way above my pay grade.
Then a practical worry, how does the compiler tell if two lambdas
are actually identical? There's a surprising number of cases that
look obvious to us, but aren't actually. Suppose it refers to a
different local variable. Or ends up with a different type of
argument. Or what if they came from separate compilation units?
It is legitimately more complex than it seems at first glance.
I digress again... where was I?
Oh yeah, since it is passing an alias to the function to the
range instead of the type, the fact that they're considered
distinct entities - even if just because the implementation is
lazy and considers that one was created on line 5 and one was
created on line 8 to be an irreconcilable difference - means the
range based on that alias now has its own distinct type.
Indeed, passing the lambda as a runtime arg fixes this to some
extent since at least then the type match up. But there's still a
bit of generated code bloat (not NEARLY as much! but still a bit).
For best results, declare your own function as close to top-level
as you can with as many discrete types as you can, and give it a
name. Don't expect the compiler to automatically factor things
out for you. (for now, i still kinda hope the implementation can
improve someday.)
This is obviously more of a hassle. Even with a runtime param you
have to specify more than just `a => a*2`... minimally like `(int
a) => a*2`.
> ```d
> struct S(alias Func)
> {
> pragma(msg, __traits(identifier, Func));
> }
>
> int func1(int a){ return a*2; }
>
> int func2(int a){ return a*2; }
>
> void main()
> {
> auto a = S!func1();
> auto b = S!func2();
>
> pragma(msg, typeof(a));
> pragma(msg, typeof(b));
> a = b;
> }
>
> ```
> I'm going to go above my station and call this a bug in
> typeof/typeid.
Wait, what's the bug there? The typeof DOES tell you they are
separate.
Error: cannot implicitly convert expression `b` of type
`S!(func2)` to `S!(func1)`
Just remember it isn't the function name per se, it is the symbol
`alias` that S is taking. Which means it is different for each
symbol passed... The alias in the parameter list tells you it is
making a new type for each param. Same as if you did
struct S(int a) {}
S!1 would be a distinct type from S!2.
More information about the Digitalmars-d-learn
mailing list