Let's get the semantic around closure fixed.

Paul Backus snarwin at gmail.com
Wed May 19 15:11:52 UTC 2021


On Wednesday, 19 May 2021 at 14:24:57 UTC, Jesse Phillips wrote:
> ```dlang
> for (int i = 0; i < 10; i++) {
>         int index = i;
>         dgs ~= () {
>             import std.stdio;
>             writeln(index);
>         };
>     }
> ```
>
> When this loop concludes, the value of `i` is 10 and the value 
> of index is 9 (as shown from your output).
>
> This is because within the `for` logic `i` was increased and it 
> determined `10 < 10` is false. This means the `for`body is not 
> executed again leaving `index` at 9.
>
> I don't know why compiler magic you would expect is "correct" 
> here. We can't say `i` should be 9 as the loop would not have 
> exited then. We certainly don't want `index` to be 10 as that 
> would mean the loop expected on more time than it was defined 
> to.

A local variable's lifetime starts at its declaration and ends at 
the closing brace of the scope where it's declared:

```d
void main() {
     int x; // start of x's lifetime
     {
         int y; // start of y's lifetime
     } // end of y's lifetime
     int z; // start of z's lifetime
} // end of x's and z's lifetimes
```

This also applies to variables inside loops:

```d
void main() {
     foreach (i; 0 .. 10) {
         int x; // start of x's lifetime
     } // end of x's lifetime
}
```

We can see that this is the case by declaring a variable with a 
destructor inside a loop:

```d
import std.stdio;

struct S {
     ~this() { writeln("destroyed"); }
}

void main() {
     foreach (i; 0 .. 10) {
         S s; // start of s's lifetime
     } // end of s's lifetime
}
```

The above program prints "destroyed" 10 times. At the start of 
each loop iteration, a new instance of `s` is initialized; at the 
end of each iteration, it is destroyed.

Normally, an instance of a variable declared inside a loop cannot 
outlive the loop iteration in which it was created, so the 
compiler is free to reuse the same memory for each instance. We 
can verify that it does so by printing out the address of each 
instance:

```d
import std.stdio;

struct S
{
     ~this() { writeln("destroyed ", &this); }
}

void main()
{
     foreach (i; 0 .. 10) {
         S s;
     }
}
```

On `run.dlang.io`, this prints "destroyed 7FFE478D283C" 10 times.

However, when am instance of variable declared inside a loop is 
captured in a closure, it becomes possible to access that 
instance even after the loop iteration that created it has 
finished. In this case, the lifetimes of the instances may 
overlap, and it is no longer a valid optimization to re-use the 
same memory for each one.

We can see this most clearly by declaring the variable in the 
loop `immutable`:

```d
void main() {
     int delegate()[10] dgs;

     foreach (i; 0 .. 10) {
         immutable index = i;
         dgs[i] () => index;
         assert(dgs[i]() == i);
     }

     foreach (i; 0 .. 10) {
         // if this fails, something has mutated immutable data!
         assert(dgs[i]() == i);
     }
}
```

If you run the above program, you will see that the assert in the 
second loop does, in fact, fail. By using the same memory to 
store each instance of `index`, the compiler has generated 
incorrect code that allows us to observe mutation of `immutable` 
data--something that the language spec itself says is undefined 
behavior.

In order to compile this code correctly, the compiler *must* 
allocate a separate location in memory for each instance of 
`index`. Those locations can be either on the stack (if the 
closure does not outlive the function) or on the heap; the 
important part is that they cannot overlap.


More information about the Digitalmars-d mailing list