My Reference Safety System (DIP???)

Thu Feb 26 13:33:50 PST 2015

On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic 
wrote:
> On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz 
> wrote:
>> IIRC H.S. Teoh suggested a change to the compilation model. I 
>> think he wants to expand the minimal compilation unit to a 
>> library or executable. In that case, inference for all kinds 
>> of attributes will be available in many more circumstances; 
>> explicit annotation would only be necessary for exported 
>> symbols.
>
> You probably mean Dicebot:
>
> http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk@forum.dlang.org

You're right! And I just (again wrongly) implicated Martin Nowak 
in this, too :-P

>
>> Anyway, it is a good idea to enable scope semantics implicitly 
>> for all references involved in @safe code. As far as I 
>> understand it, this is something you suggest, right? It will 
>> eliminate annotations except in cases where a parameter is 
>> returned, which - as you note - will probably be acceptable, 
>> because it's already been suggested in DIP25.
>
> Actually you could eliminate `return` parameters as well, I 
> think. If the compiler has the body of a function, which it 
> usually does, then there shouldn't be a need to mark *any* of 
> the covariant function or parameter attributes. I think it's 
> the kind of thing which should "Just Work" in all these cases.

Agreed. I had the export/import case in mind, where you don't 
have the function body. The signature then needs to contain 
`return` parameters, although `scope` would be implied by `@safe`.

>> I also think it is too coarse. Even variables declared at the 
>> same lexical scope have different lifetimes, because they are 
>> destroyed in reverse order of declaration. This is relevant if 
>> they contain references and have destructors that access the 
>> references; we need to make sure that no reference to a 
>> destroyed variable can be kept in a variable whose destructor 
>> hasn't yet run.
>
> It might be too coarse. We could reserve a few more bits for 
> depth-constant declaration order. At the same, time, it doesn't 
> seem *that* urgent to me. But maybe I'm naive about this. 
> Everything is being destroyed anyway, so what's the real danger?

struct A {
     B* b;
     ~this() {
         b.doSomething();
     }
}

struct B {
     void doSomething();
}

void foo() {
     A a;      // declscope(1)
     B b;      // declscope(1)
     a.b = &b; // refscope(1) <= declscope(1): OK
     // end of scope:
     // `b` is destroyed
     // `a`'s destructor is called
     // => your calling a method on a destroyed object
}

Basically, every variable needs to get its own declscope; all 
declscopes form a strict hierarchy (no partial overlaps).

>
>>> Principle 5: It's always un at safe to copy a declaration scope 
>>> from a higher scopedepth to a reference variable stored at 
>>> lower scopedepth. DIP69 tries to banish this type of thing 
>>> only in `scope` variables, but I'm not afraid to banish it in 
>>> all @safe code period:
>>
>> For backwards compatibility reasons, it might be better to 
>> restrict it to `scope` variables. But as all references in 
>> @safe code should be implicitly `scope`, this would mostly 
>> have the same effect.
>
> I guess this is the "Language versus Legacy" issue. I think D's 
> strength is in it's language, not its huge legacy codebase. 
> Therefore, I find myself going with the #pleasebreakourcode 
> crowd, for the sake of extending D's lead where it shines.

I'm too, actually, but it would be a really hard sell.

> I'm not sure all references in safe code need to be `scope` - 
> that would break a lot of code unto itself, right?

Not sure how much would be affected. I actually suspect that most 
of it already behaves as if it were scope, with the exception of 
newly allocated memory. But those should ideally be "owned" 
instead.

But your right, there still needs to be an opt-out possibility, 
most likely static.

>>> Principle 10: You'll probably have noticed that all scopes 
>>> accumulate each other according to lexical ordering, and 
>>> that's good news, because any sane person assigns and return 
>>> references in lexical order.
>>
>> As you say, that's broken. But why does it need to be in 
>> lexical order in the first place? I would simply analyze the 
>> entire function first, assign reference scopes, and disallow 
>> circular relations (like `a = b; b = a;`).
>
> T* fun(T* a, T** b) {
>   T* c = new T;
>   c = a;
>   *b = c;
>   return c;
> }

Algorithm for inference of ref scopes (= parameter annotations):

1) Each variable, parameter, and the return value get a ref scope 
(or ref depth). A ref scope can either be another variable 
(including `return` and `this`) or `static`.

2) The initial ref scope of variables is themselves.

3) Each time a variable (or something reachable through a 
variable) is assigned (returning is assignment to the return 
value), i.e. for each location in the function that an assignment 
happens, the new scope ref will be:

3a) the scope of the source, if it is larger or equal to the old 
scope

3b) otherwise (for disjunct scopes, or assignment from smaller to 
larger scope), it is an error (could potentially violate 
guarantees)

4) If a source scope refers to a variable (apart from the 
destination itself), for which not all assignments have been 
processed yet, it is put into a queue, to be evaluated later. For 
code like `a = b; b = a;` there can be dependency cycles. Such 
code will be disallowed.

How exactly the scope of a complex expression has to be computed 
is left open here.

In the end, if there was no error, all variables, parameters and 
the return value will have a minimum reference scope assigned. If 
that scope is the variable itself, they can be inferred as 
`scope`. If it is a parameter, that parameter get an 
`out!identifier` or `return` annotation.

Note that the order in which the "assignments" occur inside the 
function doesn't matter. This is more restrictive than strictly 
necessary, but it's certainly ok in most cases, easy to work 
around when not, and it doesn't require data/control flow 
analysis.

(By the way: inference cannot work for recursive functions.)

Your example:

T* fun(T* a, T** b) {
     // => S(a) = a
     // => S(b) = b
     // => S(return) = <doesn't matter>
     T* c; // == (T*).init == null
     // => S(c) = c
     c = new T;
     // `new` returns static, which is wider than c
     // => S(c) = static
     c = a;
     // => invalid, narrowing not allowed
     // (this is what I asked about, and now I
     // see why it's necessary)
     // let's assume it didn't happen, so that
     // the next two statements work
     *b = c;
     // => S(b) = S(c) = static
     return c;
     // => S(return) = S(c) = static
}

This algorithm can also be modified slightly to allow only 
partial inference (only of some variables, e.g. locals, when the 
parameters have already been explicitly annotated), as well as 
for checking whether the assignments are valid in this case.

I'm a bit tired now, so maybe this contains glaring mistakes, but 
if so, I hope they can be fixed :-) I hope it's clear what I'm 
trying to do here.

Something else that needs consideration: What happens when 
parameters alias each other? I think it is ok, because the 
checking phase will naturally prohibit calling functions in a way 
that would break the guarantees, but I haven't thought it through 
completely.

>> It's not so simple at all. For full-blown unique ownership, 
>> there needs to be some kind of borrow-checking like in Rust. I 
>> have some ideas how a simple borrow-checker can be implemented 
>> without much work (without data flow analysis as Rust does). 
>> It's basically my "const borrowing" idea (whose one flaw 
>> incidentally cannot be triggered by unique types, because it 
>> is conditioned on the presence of aliasing).
>>
>> There are still some things in the proposal that I'm sure can 
>> be simplified. We probably don't need new keywords like 
>> `noscope`. I'm not even sure the concept itself is needed.
>
> Unless you want to flat out ban copying a parameter reference 
> to a global in @safe code, you will need `noscope`, or, as you 
> suggested, `static`.

You're right, it's necessary.

> I'm actually thinking of reusing `noscope` as a function 
> attribute (`@noscope` perhaps) which says that the function may 
> return a heap or global reference. This is all that's necessary 
> to complete an ownership system. If a scope has exactly 1 
> "mystery" bit set, and is known not to come from the heap or a 
> global, then you know that it *must* contain a reference to 
> exactly the parameter for which the mystery bit is set. You 
> know exactly what it contains == ownership.

I will have to think about this, but I believe you cannot express 
such concepts as deadalnix's islands, or "const borrowing". But 
maybe, if we're lucky, I'm wrong :-)