Proposal for design of 'scope' (Was: Re: Opportunities for D)

Thu Jul 10 13:10:36 PDT 2014

I've been working on a proposal for ownership and borrowing since 
some time, and I seem to have come to a very similar result as 
you have. It is not really ready, because I keep discovering 
weaknesses, and can only work on it in my free time, but I'm glad 
this topic is finally addressed. I'll write about what I have now:

First of all, as you've already stated, scope needs to be a type 
modifier (currently it's a storage class, I think). This has 
consequences for the syntax of any parameters it takes, because 
for type modifiers there need to be type constructors. This 
means, the `scope(...)` syntax is out. I suggest to use template 
instantiation syntax instead: `scope!(...)`, which can be freely 
combined with the type constructor syntax: 
`scope!lifetime(MyClass)`.

Explicit lifetimes are indeed necessary, but dedicated 
identifiers for them are not. Instead, it can directly refer to 
symbol of the "owner". Example:

     int[100] buffer;
     scope!buffer(int[]) slice;

Instead of lifetime intersections with `&` (I believe Timon 
proposed that in the original thread), simply specify multiple 
"owners": `scope!(a, b)`. This works, because as far as I can see 
there is no need for lifetime unions, only intersections.

A problem that has been discussed in a few places is safely 
returning a slice or a reference to an input parameter. This can 
be solved nicely:

     scope!haystack(string) findSubstring(
         scope string haystack,
         scope string needle
     );

Inside `findSubstring`, the compiler can make sure that no 
references to `haystack` or `needle` can be escape (an 
unqualified `scope` can be used here, no need to specify an 
"owner"), but it will allow returning a slice from it, because 
the signature says: "The return value will not live longer than 
the parameter `haystack`."

     // fixed-size arrays (new syntax of Kenji's PR)
     string[$] text = "Old McDonald had a farm.";
     auto sub = findSubstring(text, "had");
     // typeof(sub) is scope!text(string),
     // `haystack` gets substituted by `text`
     assert(sub == "had a farm".);

Have multiple parameters? No problem:

     scope!(a,b)(string) selectOneAtRandom(
         scope string a,
         scope string b
     );
     // => a _and_ b will outlive return value

For methods, `scope!this` can be used to. It's really no 
different from other parameters, as `this` is just a special 
implicit parameter.

There is also a nice extension: `scope!(const owner)`. This 
means, that as long as the value designated as such live, `owner` 
will be treated as const.

An interesting application is the old `byLine` problem, where the 
function keeps an internal buffer which is reused for every line 
that is read, but a slice into it is returned. When a user 
naively stores these slices in an array, she will find that all 
of them have the same content, because they point to the same 
buffer. See how this is avoided with `scope!(const ...)`:

struct ByLineImpl(Char, Terminator) {
private:
     Char[] line;
     // ...

public:
     // - return value must not outlive `this` (i.e. the range)
     // - as long as the return value exists, `this` will be const
     @property scope!(const this)(Char[]) front() const {
         return line;
     }
     void popFront() { // not `const`, of course
         // ...
     }
     // ...
}

void main() {
     alias Line = const(char)[];
     auto byline = stdin.byLine();
     foreach(line; byline) {
         write(line); // OK, `write` takes its parameters as scope
         // (assuming the widespread usage of scope throughtout 
Phobos)
     }
     Line[] lines;
     foreach(line; byline) {
         lines ~= line;
         // ERROR: `line` has type scope!(const byline)(Line), not 
Line
     }
     // let's try to work around it:
     scope!(const byline)(Line)[] clines;
     foreach(line; byline) {     // ERROR: `byline` is const
         clines ~= line;
     }
     // => nope, won't work
     // another example, to show how it works:
     auto tmp = byline.front;    // OK
     // `byline` is const as long as `tmp` exists
     write(byline.front);        // OK, `front` is const
     byline.popFront();          // ERROR: `byline` is const
}

Describing what happens here: As long as any variable (or 
temporary) with the type `scope!(const byline)` exists, `byline` 
itself will be treated as const. "Exists" in this case only 
referes to lexical scope: A variable is said to "exist" from the 
point it is declared, to the end of the scope it's declared in. 
Loops, gotos, and exceptions don't have an effect. This means 
that it can be easily checked by the compiler, without it having 
to perform complicated control flow analysis.

I also thought about allowing `scope!return` for functions, to 
specify that it a value will not outlive the value returned from 
the function, but I'm not sure whether there is an actual use 
case, and the semantics are not clear.

An open question is whether there needs to be an explicit 
designation of GC'd values (for example by `scope!static` or 
`scope!GC`), to say that a given values lives as long as it's 
needed (or "forever").

Specifying an owner in the type also integrates naturally with 
allocators. Assuming an allocator releases all of it's memory to 
operating system when it is destroyed, there needs to be a 
guarantee that none of its contents is referenced anymore at this 
point. This can be achieved by returning a borrowed reference:

     struct MyAllocator {
         scope!this(T) alloc(T)() if(T == class) {
             // ...
         }
     }

Note that this does not preclude the allocator from doing garbage 
collection while it exists; in this manner, `scope!GC` might just 
be an application of this pattern instead of a special syntax.

Now, for the problems:

Obviously, there is quite a bit of complexity involved. I can 
imagine that inferring the scope for templates (which is 
essential, just as for const and the other type modifiers) can be 
complicated.

On the upside, at least it requires no control or data flow 
analysis. It's also a purely additive change: If implemented 
right, no currently working code will break.

Then I encountered the following problem, and there are several 
different variations of it:

     struct S {
         int* p;
         void releaseBuffer() scope {
             // `scope` in the signature applies to `this`
             free(this.p);
             this.p = null;
         }
     }
     int bar(scope ref S a, scope int* b) {
         a.releaseBuffer();
         return *b; // use after free
     }
     S s;
     bar(s, s.p);

The root cause of the problem here is the call to `free()`. I 
_believe_ the solution is that `free()` (and equivalent functions 
of allocators as well as `delete`) must not accept scope 
parameters. More realistic candidates for such situations are 
destructors in combination with move semantics. Therefore, 
`~this()` needs to be marked as scope, too, for it to be callable 
on a borrowed object. If a scope object has a non-scope 
destructor, but no scope one, and is going to be destroyed, this 
needs to be a compile error. (Rust avoids that problem by making 
any object const while there are borrowed references, but this 
requires its complex borrow checker, which we should avoid for D.)

I also have a few ideas about owned types and move semantics, but 
this is mostly independent from borrowing (although, of course, 
it integrates nicely with it). So, that's it, for now. Sorry for 
the long text. Thoughts?