My Reference Safety System (DIP???)
via Digitalmars-d
digitalmars-d at puremagic.com
Wed Feb 25 13:26:31 PST 2015
I didn't yet have much time to look at it closely enough, but
I'll already make some comments.
On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic
wrote:
> Principle 3: Extra function and parameter attributes are the
> tradeoff for great memory safety. There is no other way to
> support both encapsulation of control flow (Principle 2) and
> the separate-compilation model (indispensable to D). Function
> signatures pay the price for this with their expanding size. I
> try to create the new attributes for the rare case, as opposed
> to the common one, so that they don't appear very often.
IIRC H.S. Teoh suggested a change to the compilation model. I
think he wants to expand the minimal compilation unit to a
library or executable. In that case, inference for all kinds of
attributes will be available in many more circumstances; explicit
annotation would only be necessary for exported symbols.
Anyway, it is a good idea to enable scope semantics implicitly
for all references involved in @safe code. As far as I understand
it, this is something you suggest, right? It will eliminate
annotations except in cases where a parameter is returned, which
- as you note - will probably be acceptable, because it's already
been suggested in DIP25.
>
> Principle 4: Scopes. My system has its own notion of scopes.
> They are compile time information, used by the compiler to
> ensure safety. Every declaration which holds data at runtime
> must have a scope, called its "declaration scope". Every
> reference type (defined below in Principle 6) will have an
> additional scope called its "reference scope". A scope consists
> of a very short bit array, with a minimum of approximately 16
> bits and reasonable maximum of 32, let's say. For this proposal
> I'm using 16, in order to emphasize this system's memory
> efficiency. 32 bits would not change anything fundamental, only
> allow the compiler to be a little more precise about what's
> safe and what's not, which is not a big deal since it
> conservatively defaults to @system when it doesn't know.
This bitmask seems to be mostly an implementation detail. AFAIU,
further below you're introducing some things that make it visible
to the user. I'm not convinced this is a good idea; it looks
complicated for sure.
I also think it is too coarse. Even variables declared at the
same lexical scope have different lifetimes, because they are
destroyed in reverse order of declaration. This is relevant if
they contain references and have destructors that access the
references; we need to make sure that no reference to a destroyed
variable can be kept in a variable whose destructor hasn't yet
run.
>
> So what are these bits? Reserve 4 bits for an unsigned integer
> (range 0-15) I call "scopedepth". Scopedepth is easier for me
> to think about than lifetime, of which it is simply the
> inverse, with (0) scopedepth being infinite lifetime, 1 having
> a lifetime at function scope, etc. Anyway, a declaration's
> scopedepth is determined according to logic similar that found
> in DIP69 and Mark Schutz's proposal:
>
> int r; // declaration scopedepth(0)
>
> void fun(int a /*scopedepth(0)*/) {
(Already pointed out by deadalnix.) Why do parameters have the
same depth as globals?
> int b; // depth(1)
> {
> int c; // depth(2)
> {
> int d; // (3)
> }
> {
> int e; // (3)
> }
> }
> int f; // (1)
> }
>
> Principle 5: It's always un at safe to copy a declaration scope
> from a higher scopedepth to a reference variable stored at
> lower scopedepth. DIP69 tries to banish this type of thing only
> in `scope` variables, but I'm not afraid to banish it in all
> @safe code period:
For backwards compatibility reasons, it might be better to
restrict it to `scope` variables. But as all references in @safe
code should be implicitly `scope`, this would mostly have the
same effect.
> Principle 6: Reference variables: Any data which stores a
> reference is a "reference variable". That includes any pointer,
> class instance, array/slice, `ref` parameter, or any struct
> containing any of those. For the sake of simplicity, I boil
> _all_ of these down to "T*" in this proposal. All reference
> types are effectively the _same_ in this regard. DIP25 does not
> indicate that it has any interest in expanding beyond `ref`
> parameters. But all reference types are unsafe in exactly the
> same way as `ref` is. (By the way, see footnote [1] for why I
> think `ref` is much different from `scope`). I don't understand
> the restriction of dIP25 to `ref` paramteres only. Part of my
> system is to expand `return` parameter to all reference types.
Fully agree with the necessity to apply it to all kinds of
references, of course.
> Principle 8: Any time a reference is copied, the reference
^^^^^^^^^^^
Principle 7 ?
> scope inherits the *maximum* of the two scope depths:
>
> T* gru() {
> static T st; // decl depth(0)
> T t; // decl depth(1)
> T* tp = &t; // ref depth(1)
> tp = &st; // ref depth STILL (1)
> return tp; // error!
> }
>
> If you have ever loaded a reference with a local scope, it
> retains that scope level permanently, ensuring the safety of
> the reference.
Why is this rule necessary? Can you show an example what could go
wrong without it? I assume it's just there to ease implementation
(avoids the need for data flow analysis)?
> T* fun(T* a, T* b, T** c) {
> // the function's "return scope" accumulates `a` here
> return a;
> T* d = b; // `d's reference scope accumulates `b`
>
> // the return scope now accumulates `b` from `d`
> return d;
>
> *c = d; // now mutable parameter `c` gets `d`
>
> static T* t;
> *t = b; // this might be safe, but only the caller can know
> }
>
> All this accumulation results in the implicit function
> signature:
>
> T* fun(return T* a, // DIP25
> return noscope T* d, // DIP25 and DIP71
> out!b T** c // from DIP71
> ) @safe;
I supposed that's about attribute inference?
> Principle 10: You'll probably have noticed that all scopes
> accumulate each other according to lexical ordering, and that's
> good news, because any sane person assigns and return
> references in lexical order.
As you say, that's broken. But why does it need to be in lexical
order in the first place? I would simply analyze the entire
function first, assign reference scopes, and disallow circular
relations (like `a = b; b = a;`).
> Conclusion
>
> 1. With this system as foundation, an effective ownership
> system is easily within reach. Just confine the outgoing scopes
> to a single parameter and no globals, and you have your
> ownership. You might need another (rare) function attribute to
> help with this, and a storage class (e.g. `scope`, `unique`) to
> give you an error when you do something wrong, but the
> groundwork is 90% laid.
It's not so simple at all. For full-blown unique ownership, there
needs to be some kind of borrow-checking like in Rust. I have
some ideas how a simple borrow-checker can be implemented without
much work (without data flow analysis as Rust does). It's
basically my "const borrowing" idea (whose one flaw incidentally
cannot be triggered by unique types, because it is conditioned on
the presence of aliasing).
There are still some things in the proposal that I'm sure can be
simplified. We probably don't need new keywords like `noscope`.
I'm not even sure the concept itself is needed.
That all said, I think you're on the right track. The fact that
you don't require a new type modifier will make Walter very
happy. This looks pretty good!
More information about the Digitalmars-d
mailing list