My Reference Safety System (DIP???)

Wed Feb 25 13:26:31 PST 2015

I didn't yet have much time to look at it closely enough, but 
I'll already make some comments.

On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic 
wrote:
> Principle 3: Extra function and parameter attributes are the 
> tradeoff for great memory safety. There is no other way to 
> support both encapsulation of control flow (Principle 2) and 
> the separate-compilation model (indispensable to D). Function 
> signatures pay the price for this with their expanding size. I 
> try to create the new attributes for the rare case, as opposed 
> to the common one, so that they don't appear very often.

IIRC H.S. Teoh suggested a change to the compilation model. I 
think he wants to expand the minimal compilation unit to a 
library or executable. In that case, inference for all kinds of 
attributes will be available in many more circumstances; explicit 
annotation would only be necessary for exported symbols.

Anyway, it is a good idea to enable scope semantics implicitly 
for all references involved in @safe code. As far as I understand 
it, this is something you suggest, right? It will eliminate 
annotations except in cases where a parameter is returned, which 
- as you note - will probably be acceptable, because it's already 
been suggested in DIP25.

>
> Principle 4: Scopes. My system has its own notion of scopes. 
> They are compile time information, used by the compiler to 
> ensure safety. Every declaration which holds data at runtime 
> must have a scope, called its "declaration scope". Every 
> reference type (defined below in Principle 6) will have an 
> additional scope called its "reference scope". A scope consists 
> of a very short bit array, with a minimum of approximately 16 
> bits and reasonable maximum of 32, let's say. For this proposal 
> I'm using 16, in order to emphasize this system's memory 
> efficiency. 32 bits would not change anything fundamental, only 
> allow the compiler to be a little more precise about what's 
> safe and what's not, which is not a big deal since it 
> conservatively defaults to @system when it doesn't know.

This bitmask seems to be mostly an implementation detail. AFAIU, 
further below you're introducing some things that make it visible 
to the user. I'm not convinced this is a good idea; it looks 
complicated for sure.

I also think it is too coarse. Even variables declared at the 
same lexical scope have different lifetimes, because they are 
destroyed in reverse order of declaration. This is relevant if 
they contain references and have destructors that access the 
references; we need to make sure that no reference to a destroyed 
variable can be kept in a variable whose destructor hasn't yet 
run.

>
> So what are these bits? Reserve 4 bits for an unsigned integer 
> (range 0-15) I call "scopedepth". Scopedepth is easier for me 
> to think about than lifetime, of which it is simply the 
> inverse, with (0) scopedepth being infinite lifetime, 1 having 
> a lifetime at function scope, etc. Anyway, a declaration's 
> scopedepth is determined according to logic similar that found 
> in DIP69 and Mark Schutz's proposal:
>
> int r; // declaration scopedepth(0)
>
> void fun(int a /*scopedepth(0)*/) {

(Already pointed out by deadalnix.) Why do parameters have the 
same depth as globals?

>   int b; // depth(1)
>   {
>     int c; // depth(2)
>     {
>       int d; // (3)
>     }
>     {
>       int e; // (3)
>     }
>   }
>   int f; // (1)
> }
>
> Principle 5: It's always un at safe to copy a declaration scope 
> from a higher scopedepth to a reference variable stored at 
> lower scopedepth. DIP69 tries to banish this type of thing only 
> in `scope` variables, but I'm not afraid to banish it in all 
> @safe code period:

For backwards compatibility reasons, it might be better to 
restrict it to `scope` variables. But as all references in @safe 
code should be implicitly `scope`, this would mostly have the 
same effect.

> Principle 6: Reference variables: Any data which stores a 
> reference is a "reference variable". That includes any pointer, 
> class instance, array/slice, `ref` parameter, or any struct 
> containing any of those. For the sake of simplicity, I boil 
> _all_ of these down to "T*" in this proposal. All reference 
> types are effectively the _same_ in this regard. DIP25 does not 
> indicate that it has any interest in expanding beyond `ref` 
> parameters. But all reference types are unsafe in exactly the 
> same way as `ref` is. (By the way, see footnote [1] for why I 
> think `ref` is much different from `scope`). I don't understand 
> the restriction of dIP25 to `ref` paramteres only. Part of my 
> system is to expand `return` parameter to all reference types.

Fully agree with the necessity to apply it to all kinds of 
references, of course.

> Principle 8: Any time a reference is copied, the reference
   ^^^^^^^^^^^
   Principle 7 ?
> scope inherits the *maximum* of the two scope depths:
>
> T* gru() {
>   static T st; // decl depth(0)
>   T t; // decl depth(1)
>   T* tp = &t; // ref depth(1)
>   tp = &st; // ref depth STILL (1)
>   return tp; // error!
> }
>
> If you have ever loaded a reference with a local scope, it 
> retains that scope level permanently, ensuring the safety of 
> the reference.

Why is this rule necessary? Can you show an example what could go 
wrong without it? I assume it's just there to ease implementation 
(avoids the need for data flow analysis)?

> T* fun(T* a, T* b, T** c) {
>   // the function's "return scope" accumulates `a` here
>   return a;
>   T* d = b; // `d's reference scope accumulates `b`
>
>   // the return scope now accumulates `b` from `d`
>   return d;
>
>   *c = d; // now mutable parameter `c` gets `d`
>
>   static T* t;
>   *t = b; // this might be safe, but only the caller can know
> }
>
> All this accumulation results in the implicit function 
> signature:
>
> T* fun(return T* a, // DIP25
>        return noscope T* d, // DIP25 and DIP71
>        out!b T** c  // from DIP71
>        ) @safe;

I supposed that's about attribute inference?

> Principle 10: You'll probably have noticed that all scopes 
> accumulate each other according to lexical ordering, and that's 
> good news, because any sane person assigns and return 
> references in lexical order.

As you say, that's broken. But why does it need to be in lexical 
order in the first place? I would simply analyze the entire 
function first, assign reference scopes, and disallow circular 
relations (like `a = b; b = a;`).

> Conclusion
>
> 1. With this system as foundation, an effective ownership 
> system is easily within reach. Just confine the outgoing scopes 
> to a single parameter and no globals, and you have your 
> ownership. You might need another (rare) function attribute to 
> help with this, and a storage class (e.g. `scope`, `unique`) to 
> give you an error when you do something wrong, but the 
> groundwork is 90% laid.

It's not so simple at all. For full-blown unique ownership, there 
needs to be some kind of borrow-checking like in Rust. I have 
some ideas how a simple borrow-checker can be implemented without 
much work (without data flow analysis as Rust does). It's 
basically my "const borrowing" idea (whose one flaw incidentally 
cannot be triggered by unique types, because it is conditioned on 
the presence of aliasing).

There are still some things in the proposal that I'm sure can be 
simplified. We probably don't need new keywords like `noscope`. 
I'm not even sure the concept itself is needed.

That all said, I think you're on the right track. The fact that 
you don't require a new type modifier will make Walter very 
happy. This looks pretty good!