delegate confusion

Timon Gehr via Digitalmars-d digitalmars-d at puremagic.com
Fri Aug 4 10:27:52 PDT 2017


On 04.08.2017 18:57, bitwise wrote:
> I'm confused about how D's lambda capture actually works, and can't find 
> any clear specification on the issue. I've read the comments on the bug 
> about what's described below, but I'm still confused. The conversation 
> there dropped off in 2016, and the issue hasn't been fixed, despite high 
> bug priority and plenty of votes.
> 
> Consider this code:
> 
> void foo() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>          funs ~= (){ writeln(i); };
> 
>      foreach(fun; funs)
>          fun();
> }
> 
> void bar() {
>      void delegate()[] funs;
> 
>      foreach(i; 0..5)
>      {
>          int j = i;
>          funs ~= (){ writeln(j); };
>      }
>      foreach(fun; funs)
>          fun();
> }
> 
> 
> void delegate() baz() {
>      int i = 1234;
>      return (){ writeln(i); };
> }
> 
> void overwrite() {
>      int i = 5;
>      writeln(i);
> }
> 
> int main(string[] argv)
> {
>      foo();
>      bar();
> 
>      auto fn = baz();
>      overwrite();
>      fn();
> 
>      return 0;
> }
> 
> First, I run `foo`. The output is "4 4 4 4 4".
> So I guess `i` is captured by reference, and the second loop in `foo` 
> works because the stack hasn't unwound, and `i` hasn't been overwritten, 
> and `i` contains the last value that was assigned to it.
> 
> Next I run `bar`. I get the same output of "4 4 4 4 4". While this hack 
> works in C#,

It's very important to understand that the C# is different, even though 
it looks similar. In D, the foreach loop variable is a distinct 
declaration for each loop iteration, while in C#, the same loop variable 
is repeatedly reassigned. In C#, the issue is bad language design, while 
in D, the issue is a buggy compiler implementation leading to memory 
corruption.

> I suppose it's reasonable to assume the D compiler would 
> just reuse stack space for `j

It's reasonable to assume that the D compiler uses the same memory 
location for all of the distinct variables. This is a dangling pointer 
bug, if you wish. Both of your examples should print "0 1 2 3 4".

> and that the C# compiler has some 
> special logic built in to handle this.
> ...

The C# compiler just uses the correct rules for creating closures. (It 
is hard for the compiler to screw this up, because the underlying 
platform aims to prevents memory corruption.)

> Now, I test my conclusions above, and run `baz`, `overwrite` and `fn`. 
> The result? total confusion.
> The output is "5" then "1234". So if the lambdas are referencing the 
> stack, why wasn't 1234 overwritten?
> ...

The lambdas are referencing the heap, but all of them reference 
identical heap locations. This should not happen. Distinct variables 
shouldn't share the same memory.

> Take a simple C++ program for example:
> 
> int* foo() {
>      int i = 1234;
>      return &i;
> }
> 
> void overwrite() {
>      int i = 5;
>      printf("%d\n", i);
> }
> 
> int main()
> {
>      auto a = foo();
>      overwrite();
>      printf("%d\n", *a);
>      return 0;
> }
> 
> This outputs "5" and "5" which is exactly what I expect, because I'm 
> overwriting the stack space where the first `i` was stored with "5".
>  > So now, I'm thinking.... D must be storing these captures on the heap
> then..right? So why would I get "4 4 4 4 4" instead of "0 1 2 3 4" for 
> `foo` and `bar`?
> 
> This makes absolutely no sense at all.
> 
> It seems like there are two straight forward approaches available here:
> 
> 1) capture everything by reference, in which case the `overwrite` 
> example would work just like the C++ version. Then, it would be up to 
> the programmer to heap allocate anything living beyond the current scope.
> ...

Capturing by reference is not the same as creating stack references. The 
language semantics don't even need to be implemented using a stack.

> 2) heap allocate a chunk of space for each lambda's captures, and copy 
> everything captured into that space when the lambda is constructed. This 
> of course, would mean that `foo` and `bar` would both output "0 1 2 3 4".
> ...

3) heap allocate a chunk of space for each captured scope (as in lisp 
and C#).

The way to go is 3). 1) is bad, because it completely prevents closures 
from being escaped, 2) is bad because it does not allow sharing of 
closure memory.

> When I look at the output I get from the code above though, it seems 
> like neither of these things were done, and that someone has gone way 
> out of their way to implement some very strange behavior.
> ...

Absolutely not. The current behavior was quite straightforward to 
implement, but it is wrong. Bugs often lead to strange behavior. This 
does not imply that such bugs are intentional.

> What I would prefer, would be a mixture of reference and value capture 
> like C++, where I could explicitly state whether I wanted (1) or (2). I 
> would settle for (2) though.
> ...

"Like C++" does not work: in C++, each lambda has its own unique type.

> While I'm sure there is _some_ reason that things currently work the way 
> they do, the current behavior is very unintuitive, and gives no control 
> over how things are captured.
> 

You can work around the bug like this:

foreach(i;0..5)(){
     int j=i;
     funs~=(){ writeln(j); };
}()



More information about the Digitalmars-d mailing list