My Reference Safety System (DIP???)

Thu Feb 26 09:56:09 PST 2015

On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
> IIRC H.S. Teoh suggested a change to the compilation model. I 
> think he wants to expand the minimal compilation unit to a 
> library or executable. In that case, inference for all kinds of 
> attributes will be available in many more circumstances; 
> explicit annotation would only be necessary for exported 
> symbols.

You probably mean Dicebot:

http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk@forum.dlang.org

> Anyway, it is a good idea to enable scope semantics implicitly 
> for all references involved in @safe code. As far as I 
> understand it, this is something you suggest, right? It will 
> eliminate annotations except in cases where a parameter is 
> returned, which - as you note - will probably be acceptable, 
> because it's already been suggested in DIP25.

Actually you could eliminate `return` parameters as well, I 
think. If the compiler has the body of a function, which it 
usually does, then there shouldn't be a need to mark *any* of the 
covariant function or parameter attributes. I think it's the kind 
of thing which should "Just Work" in all these cases.

>> Principle 4: Scopes. My system has its own notion of scopes. 
>> They are compile time information, used by the compiler to 
>> ensure safety. Every declaration which holds data at runtime 
>> must have a scope, called its "declaration scope". Every 
>> reference type (defined below in Principle 6) will have an 
>> additional scope called its "reference scope". A scope 
>> consists of a very short bit array, with a minimum of 
>> approximately 16 bits and reasonable maximum of 32, let's say. 
>> For this proposal I'm using 16, in order to emphasize this 
>> system's memory efficiency. 32 bits would not change anything 
>> fundamental, only allow the compiler to be a little more 
>> precise about what's safe and what's not, which is not a big 
>> deal since it conservatively defaults to @system when it 
>> doesn't know.
>
> This bitmask seems to be mostly an implementation detail.

I guess I'm trying to win over the people who might think the 
system will cost too much memory or compilation time.

> AFAIU, further below you're introducing some things that make 
> it visible to the user.

The only things I'm making visible to the user are things which 
*must* appear in the function signature for the sake of the 
separate compilation model. Everything else would be invisible, 
except the occasional false positive, where something actually 
safe is thought unsafe (the solution being to enclose the 
statement in a @trusted black or lambda).

> I'm not convinced this is a good idea; it looks complicated for 
> sure.

It's not that complicated. My main fear is that it's too simple! 
Some of the logic may seem complicated, but the goal is to make 
it possible to compile a function without having to visit any 
other function. Everything is figured out "in house".

> I also think it is too coarse. Even variables declared at the 
> same lexical scope have different lifetimes, because they are 
> destroyed in reverse order of declaration. This is relevant if 
> they contain references and have destructors that access the 
> references; we need to make sure that no reference to a 
> destroyed variable can be kept in a variable whose destructor 
> hasn't yet run.

It might be too coarse. We could reserve a few more bits for 
depth-constant declaration order. At the same, time, it doesn't 
seem *that* urgent to me. But maybe I'm naive about this. 
Everything is being destroyed anyway, so what's the real danger?

>> Principle 5: It's always un at safe to copy a declaration scope 
>> from a higher scopedepth to a reference variable stored at 
>> lower scopedepth. DIP69 tries to banish this type of thing 
>> only in `scope` variables, but I'm not afraid to banish it in 
>> all @safe code period:
>
> For backwards compatibility reasons, it might be better to 
> restrict it to `scope` variables. But as all references in 
> @safe code should be implicitly `scope`, this would mostly have 
> the same effect.

I guess this is the "Language versus Legacy" issue. I think D's 
strength is in it's language, not its huge legacy codebase. 
Therefore, I find myself going with the #pleasebreakourcode 
crowd, for the sake of extending D's lead where it shines. I'm 
not sure all references in safe code need to be `scope` - that 
would break a lot of code unto itself, right?

>> Principle 8: Any time a reference is copied, the reference
>   ^^^^^^^^^^^
>   Principle 7 ?
>> scope inherits the *maximum* of the two scope depths:
>>
>> T* gru() {
>>  static T st; // decl depth(0)
>>  T t; // decl depth(1)
>>  T* tp = &t; // ref depth(1)
>>  tp = &st; // ref depth STILL (1)
>>  return tp; // error!
>> }
>>
>> If you have ever loaded a reference with a local scope, it 
>> retains that scope level permanently, ensuring the safety of 
>> the reference.
>
> Why is this rule necessary? Can you show an example what could 
> go wrong without it? I assume it's just there to ease 
> implementation (avoids the need for data flow analysis)?

You're right. It's only necessary when code is branching. My 
proposal could be amended as such.

>> T* fun(T* a, T* b, T** c) {
>>  // the function's "return scope" accumulates `a` here
>>  return a;
>>  T* d = b; // `d's reference scope accumulates `b`
>>
>>  // the return scope now accumulates `b` from `d`
>>  return d;
>>
>>  *c = d; // now mutable parameter `c` gets `d`
>>
>>  static T* t;
>>  *t = b; // this might be safe, but only the caller can know
>> }
>>
>> All this accumulation results in the implicit function 
>> signature:
>>
>> T* fun(return T* a, // DIP25
>>       return noscope T* d, // DIP25 and DIP71
>>       out!b T** c  // from DIP71
>>       ) @safe;
>
> I supposed that's about attribute inference?

Well, that, and in the absence of inference, errors in @safe 
functions.

>> Principle 10: You'll probably have noticed that all scopes 
>> accumulate each other according to lexical ordering, and 
>> that's good news, because any sane person assigns and return 
>> references in lexical order.
>
> As you say, that's broken. But why does it need to be in 
> lexical order in the first place? I would simply analyze the 
> entire function first, assign reference scopes, and disallow 
> circular relations (like `a = b; b = a;`).

T* fun(T* a, T** b) {
   T* c = new T;
   c = a;
   *b = c;
   return c;
}

Both `b` and the "return scope" need to pick up that they are 
from `a` (the end result being the signature "T* fun(return T* a, 
out!a T** b);"). If `c` is returned first, the return scope will 
only inherit what c was declared with. It won't pick up that it 
also has `a's scope. What underlying mechanism would you have the 
compiler use to allow for these chains of references? (Note that 
I haven't yet suggested the final attribute which would imbue the 
return scope with heap or global references, and thus this 
possibility is not yet contained in the function signature.)

>> Conclusion
>>
>> 1. With this system as foundation, an effective ownership 
>> system is easily within reach. Just confine the outgoing 
>> scopes to a single parameter and no globals, and you have your 
>> ownership. You might need another (rare) function attribute to 
>> help with this, and a storage class (e.g. `scope`, `unique`) 
>> to give you an error when you do something wrong, but the 
>> groundwork is 90% laid.
>
> It's not so simple at all. For full-blown unique ownership, 
> there needs to be some kind of borrow-checking like in Rust. I 
> have some ideas how a simple borrow-checker can be implemented 
> without much work (without data flow analysis as Rust does). 
> It's basically my "const borrowing" idea (whose one flaw 
> incidentally cannot be triggered by unique types, because it is 
> conditioned on the presence of aliasing).
>
> There are still some things in the proposal that I'm sure can 
> be simplified. We probably don't need new keywords like 
> `noscope`. I'm not even sure the concept itself is needed.

Unless you want to flat out ban copying a parameter reference to 
a global in @safe code, you will need `noscope`, or, as you 
suggested, `static`. I'm actually thinking of reusing `noscope` 
as a function attribute (`@noscope` perhaps) which says that the 
function may return a heap or global reference. This is all 
that's necessary to complete an ownership system. If a scope has 
exactly 1 "mystery" bit set, and is known not to come from the 
heap or a global, then you know that it *must* contain a 
reference to exactly the parameter for which the mystery bit is 
set. You know exactly what it contains == ownership.

> That all said, I think you're on the right track. The fact that 
> you don't require a new type modifier will make Walter very 
> happy. This looks pretty good!

Thanks.