DIP25/DIP1000: My thoughts round 2

Sun Sep 2 05:14:58 UTC 2018

Round 2 because I had this whole thing typed up, and then my 
power went out on me right before I posted. I was much happier 
with how that one was worded too.

Basically I'd like to go over at length one of the issues I see 
with these DIPs (though I think it applies more to DIP1000), 
namely return parameters and what we could do to make them 
stronger. I will say I do not have the chops to go implement 
these ideas myself, even if I had approval and support. This is 
more to get my thoughts out there and see what other people think 
about them (frankly I'd be putting this in Study if it wasn't a 
ghost town over there).

First I'm going to reiterate over DIP25 as I understand it for 
background, stealing some examples from the DIP page. Let's 
starting with the following.

ref int id(ref int x) {
     return x; // pass-through function that does nothing
}

ref int fun() {
     int x;
     return id(x); // escape the address of local variable
}

The id() function just takes and returns a variable by ref, which 
is perfectly legal. However it is open to abuse. As you see in 
fun(), id() is used to escape a reference to a local variable, 
which is obviously not desired behavior. The issue is how do we 
tell fun(), from id()'s signature alone, "id() will return a 
reference to whatever you pass it, one way or another. Make sure 
you don't give id()'s return value to something that'll outlive 
the argument you pass to id()" (though we need to say this in 
more concise terms obviously). DIP25 solves this pretty nicely 
with return parameters

// now this function is banned, since it has a ref parameter and 
returns by ref
ref int wrongId(ref int x) {
     return x; // ERROR! Cannot return a ref, please use "return 
ref"
}

// this is fine however
ref int id(return ref int x) {
     return x;
}

ref int fun() {
     int x;
     static int y;
     return id(x); // no, wait, since we're returning to a scope 
that'll outlive x, this errors at compile-time. Thanks return ref
     return id(y); // fine, sure, y lives forever
}

fun() now knows the return value of id() cannot outlive the 
argument it passes to id(). This allows us to disallow certain 
undesired behavior at compile-time, which is great.

With that in mind, let's move on to DIP1000. Namely, I'm looking 
at this issue Walter filed.

https://issues.dlang.org/show_bug.cgi?id=19097

I'll try to detail it here (and steal more examples, thanks Mike 
:*) ). It has to do with the same principles I outlined above for 
DIP25, only this time we're using pointers rather than refs.

First example, which works as expected

int* frank(return scope int* p) { return p; } // basically id()

void main()
{
     // lifetimes end in reverse order from which they are declared
     int* p;  // `p`'s lifetime is longer than `i`'s
     int i;   // `i`'s lifetime is longer than `q`'s
     int* q;  // `q`'s lifetime is the shortest

     q = frank(&i); // ok because `i`'s lifetime is longer than 
`q`'s
     p = frank(&i); // error because `i`'s lifetime is shorter 
than `p`'s
}

frank() marks its parameter as return, to signal to main() that 
wherever main() puts frank()'s return value, it can't outlive 
what main() passed as an argument to frank(). All fine and dandy.

Second example (I'd pay closer attention to betty()'s definition 
here)

void betty(ref scope int* r, return scope int* p)
{
     r = p; // (1) Error: scope variable `p` assigned to `r` with 
longer lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(q, &i); // (2) ok
     betty(p, &i); // (3) should be error
}

Hang on, why can't I compile betty(), when it's doing the same 
thing as frank(), only putting the return value in the first 
parameter rather than returning it? No reason, I absolutely 
should be able to compile and use betty(). So the question 
becomes, how can betty() tell main(), that what main() passes as 
the first argument to betty() can't outlive what's passed as the 
second argument? Marking the second parameter return does not 
work here, as that only ties its lifetime to the return value. It 
can't be used on arbitrary parameters. How to resolve this?

Walter's solution is as follows. If a function is void, and its 
first parameter is ref, apply the "return" annotation to the 
first parameter rather than the return value of the function. 
Using these conditions, betty() now compiles, and main() errors 
at (3) as expected. However I find this solution too restrictive. 
While it fits many functions within Phobos, we are tying users to 
this special case and forcing them to unnecessarily refactor 
their code around it. What if I don't want it to be void and want 
the function to return something as well? What if I want to 
return via the second parameter? This just seems to be setting up 
another trap for users to fall into.

I talked about this in the "Is @safe still a work-in-progress?" 
thread, but I'll repeat it here again. There is a cleaner way to 
do this. I'll demonstrate using some borrowed Rust syntax, but 
remember the syntax doesn't matter too much here so much as the 
idea. Rather than using "return", we instead annotate the 
parameters like so

void betty(ref scope int*'a r, scope int*'a p) // okay it's not 
pretty
{
     r = p; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(q, &i); // (2) ok
     betty(p, &i); // (3) error
}

Good, these are the results I expect. What if I want to output to 
the second parameter?

void betty(scope int*'a r, ref scope int*'a p)
{
     p = r; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
     int* p;
     int i;
     int* q;

     betty(&i, q); // (2) ok
     betty(&i, p); // (3) error
}

Nice, that'll work too

Here's frank()

int*'a frank(scope int*'a p) { return p; } // basically id()

void main()
{
     // lifetimes end in reverse order from which they are declared
     int* p;  // `p`'s lifetime is longer than `i`'s
     int i;   // `i`'s lifetime is longer than `q`'s
     int* q;  // `q`'s lifetime is the shortest

     q = frank(&i); // ok because `i`'s lifetime is longer than 
`q`'s
     p = frank(&i); // error because `i`'s lifetime is shorter 
than `p`'s
}

These annotations are much more flexible since they can be moved 
any which way around the function signature, and have the added 
benefit of visually tying together lifetimes. For further 
consistency it could also be extended back to DIP25

ref'a int id(ref'a int x) {
     return x;
}

Hopefully that was coherent. Again this is me for me to get my 
thoughts out there, but also I'm interested in what other people 
think about this.