Escape analysis (full scope analysis proposal)

Sun Nov 2 16:04:37 PST 2008

Michel Fortin wrote:
> On 2008-11-02 10:12:46 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> said:
> 
>> It looks like things will move that way. Bartosz, Walter and I talked 
>> a lot yesterday about it - a lot of crazy things were on the table! 
>> The next step is to make this a reference, which is highly related to 
>> escape analysis. At the risk of anticipating a bit an unfinalized 
>> design, here's what's on the table:
>>
>> * Continue an "anything goes" policy for *explicit* pointers, i.e. 
>> those written explicitly by user code with stars and stuff.
> 
> That's a little disapointing. I was hoping for something to fix all 
> holes. I know it isn't easy to design and implement, but once done I 
> firmly believe it would have the potential to completely eliminate the 
> need for explicit memory allocation. For the programmer, it's a good 
> trade: less worrying about what needs to be dynamically allocated and 
> better documented function signatures.
> 
> Perhaps that would be too much of a departure from C and C++ though.

That's only the half of it. If you want to take a look at a C-like 
language that is safe, you may want to look at Cyclone. The reality is 
that making things 100% safe are going to require more or less the moral 
equivalent of Cyclone's limitations and demands from its user. I think 
Dan Grossman has done an excellent job making things "as tight as 
possible but not tighter", so Cyclone is a great yardstick to measure 
D's tradeoffs against.

>> * Disallow pointers in SafeD.
> 
> Again a consequence of not having a full scoping solution.

A "full scoping solution" would impose demands on you that you'd be the 
first to dislike.

> Couldn't you allow pointers in SafeD, while disallowing taking the 
> address of local variables? This would limit pointers to heap-allocated 
> variables. And disallow pointer arithmetic too.

I think pointers can be allowed in SafeD under certain restrictions 
starting with the ones you mention. We best start from the safe end.

>> * Make all ref parameters scoped by default. There will be impossible 
>> for a function to escape the address of a ref parameter without a 
>> cast. I haven't proved it to myself yet, but I believe that if 
>> pointers are not used and with the amendments below regarding arrays 
>> and delegates, this makes things entirely safe. In Walter's words, "it 
>> buttons things pretty tight".
> 
> If this means you can't implement a swap function for this struct, then 
> I think you're right that it's safe:
> 
>     struct A
>     {
>         ref A a;
>     }
> 
>     void swap(ref A a0, ref A a1);
> 
> On the other side, if you can implement the swap function, then calling 
> it is unsafe since you can rebind a reference to another without being 
> able to check that their scopes are compatible.

Swap will work fine because ref is not a type constructor. Struct A is 
in error. In fact ref not being a type constructor is much of the beauty 
of it all.

> So basically, references must always be initialized at construction and 
> should be non-rebindable, just like in C++. (Hum, and I should mention I 
> don't like too much references in C++.)

No, C++ references are "almost" type constructors. Also note that 
rvalues won't bind to any kind of references in D. (More on that later.)

>> * Make this a reference so that it obeys what references obey.
> 
> Ah, so that's why Walter wanted to change that suddenly. This is a good 
> thing by itself, even without correct scoping.

Yah, in fact it's pretty amazing it seems to work out so well. We gain a 
huge guarantee without changing much in the language.

>> * If people want to implement e.g. linked lists, they should do it 
>> with classes. Implementing them with structs will require casts to 
>> obtain and escape &this. That also means they'd be using pointers, so 
>> anything goes - pointers are not restricted from escaping.
>>
>> * There are two cases in which things escape without the user 
>> explicitly using pointers: delegates and dynamic arrays initialized 
>> from stack-allocated arrays.
>>
>> * For delegates require the scope keyword in the signature of the 
>> callee. A scoped delegate cannot be stored, only called or passed down 
>> to another function that in turn takes a scoped delegate. This makes 
>> scope delegates entirely safe. Non-scoped delegates use dynamic 
>> allocation.
> 
> Again, I'd say that if you can implement a swap function with those 
> scope delegates, it's unsafe. Case in point:
> 
>     void f1(ref scope void delegate() arg)
>     {
>         int i;
>         scope void f2()
>         {
>             ++i;
>         }
> 
>         scope void delegate() inner = &f2;
>         swap(arg, inner); // this should be an error.
>         arg = inner; // this too should be an error.
>     }
> 
> If you can't rebind a the value of a scope delegate pointer, then all is 
> fine.

Indeed, rebinding would be disallowed.

>> * We don't have an idea for dynamic arrays initialized from 
>> stack-allocated arrays.
> 
> Either disallow it, either keep it as unsafe as pointers (bad for SafeD 
> I expect), or implement a complete scope-checking system (if you do it 
> for arrays, you'll have done it for pointers too). You don't have much 
> choice there, as arrays are pretty much the same thing as pointers.

Exactly. Essentially array are as "bad" as structs containing pointers.

>> Thoughts? Ideas?
> 
> I'm under the impression that scope classes could be dangerous in this 
> system: an object reference is not necessarly on the heap.

I think a fair move to do is deal away with scope classes. We can still 
allow them via systems-level tricks, but not with an innocuous construct 
that's in fact a weapon of mass destruction.

> Personally, I'd have liked to have a language where you can be 
> completely scope safe, where you could document interfaces so they know 
> the scope they're evolving in. This concept of something in between is a 
> nice attempt at a compromize, but I find it somewhat limitting.

I agree. Again, something like this was on the table:

void wyda(scope T* a, scope U* b)
         if (scope(a) <= scope(b)
{
     a.field = b;
}

I think it's not hard to appreciate the toll this kind of user-written 
function summary exacts on the user of the language.

Andrei