Escape analysis (full scope analysis proposal)

Sun Nov 9 07:10:03 PST 2008

Michel Fortin wrote:
> I'd like to point out that the two things people complained the most 
> about regarding the automatic dynamic allocation for dynamic closures:
> 
> 1.    There is no way to prevent it, to make sure there is no allocation.
> 2.    The compiler does allocate a lot more than necessary.
> 
> In my proposal, these two points are addressed:
> 
> 1.    You can declare any variable as "scope", preventing it from being 
> placed
>     in a broader scope, preventing at the same time dynamic allocation.
> 2.    The compiler being aware of what arguments do and do not escape the
>     scope of the called functions, it won't allocate unnecessarily.
> 
> So I think the situation would be much better.

I agree that an escape analyzer would improve things. I am not sure that 
one oblivious to regions is expressive enough.

> But all this is orthogonal to having or not an escape analysis system, 
> as we could choose the reverse conventions: no variable can escape its 
> scope unless explicitly authorized by some new syntactic construct.

It's not orthogonal. Whatever the default is, you must be able to 
enforce escaping rules, otherwise the system would be as good as a 
convention.

>>> If you want to make sure x never escapes the memory region associated 
>>> to its scope, then you can declare x as scope and get a compile-time 
>>> error when assigning it to p.
>>>
>>> So, in essence, the system I propose is a little simpler because 
>>> pointer variables just cannot point to values coming from a region 
>>> that doesn't exist in the scope the pointer is declared. The guaranty 
>>> I propose is that during the whole lifetime of a pointer, it points 
>>> to either a valid memory region, or null. Cyclone's approach is to 
>>> forbid you from dereferencing the pointer.
>>>
>>> Combine this with my proposal to not have dynamic regions and we 
>>> don't need named regions anymore. Perhaps the syntax could be made 
>>> simpler with region names, but technically, we don't need them as we 
>>> can always go the route of saying that a pointer value is "valid 
>>> within the scope of variable_x". This is what I'm expressing with 
>>> "scopeof(variable_x)" in my other examples, and I believe it is 
>>> analogous to the "regions_of(variable_x)" in Cyclone, although 
>>> Cyclone doesn't use it pervasively.
>>
>> IMHO this may be made to work. I personally prefer the system in which 
>> ref is safe and pointers are permissive. The system you are referring 
>> to makes ref and pointer of the same power, so we could as well 
>> dispense with either.
> 
> I'm not too thrilled by references. I once got a question from someone 
> coming from C: what is the difference between a pointer and a reference 
> in C++? I had to answer: references are pointers with a different 
> syntax, no rebindability, and no possibility of being null. It seems he 
> and I both agree that references are mostly a cosmetic patch to solve a 
> syntactic problem. References in D aren't much different.

I disagree. References in D are very different. They are not type 
constructors. They are storage classes that can only be used in function 
signatures, which makes them impossible to dangle. I think C++ 
references would also have been much better off as storage classes 
instead of half-life types.

> If we could have a unified syntax for pointers of all kinds, I think 
> it'd be more convenient than having two kinds of pointers. A 
> null-forbiding but rebindable pointer would be more useful in my opinion 
> than the current reference concept.

Well ref means "This function wants to modify its argument". That is a 
very different charter from what pointers mean. So I'm not sure how you 
say you'd much prefer this to that. They are not comparable.

>> But I'd be curious what others think of it. Notice how the discussion 
>> participants got reduced to you and me, and from what I saw that's not 
>> a good sign.
> 
> Indeed. I'm interested in other opinions too.
> 
> But I'm under the impression that many lost track of what was being 
> discussed, especially since we started referring to Cyclone which few 
> are familiar with and probably few have read the paper.

In my experience, when someone is interested in something, she'd make 
time for it. So I take that as lack of interest. And hey, since when was 
lack of expertise a real deterrent? :o)

> One of the fears expressed at the start of the thread was about 
> excessive need for annotation, but as the Cyclone paper say, with good 
> defaults, you need to add scoping annotation only to a few specific 
> places. (It took me some time to read the paper and start discussing 
> things sanely after that, remember?) So perhaps we could get more people 
> involved if we could propose a tangible syntax for it.

To be very frank, I think we are very far from having an actual 
proposal, and syntax is of very low priority now if you want to put one 
together. Right now what we have is a few vague ideas and conjectures 
(e.g., there's no need for named regions because the need would be rare 
enough to require dynamic allocation for those cases). I'm not saying 
that to criticize, but merely to underline the difficulties.

> Or perhaps not; for advanced programmers who already understand well 
> what can and cannot be done by passing pointers around, full escape 
> analysis may not seem to be a so interesting gain since they've already 
> adopted the right conventions to avoid most bugs it would prevent. And 
> most people here who can discuss this topic with some confidence are not 
> newbies to programming and don't make too much mistakes of the sort 
> anymore.
> 
> Which makes me think of beginners saying pointers are hard. You've 
> certainly seen beginners struggle as they learn how to correctly use 
> pointers in C or C++. Making sure their program fail at compile-time, 
> with an explicative error message as to why they mustn't do this or 
> that, is certainly going to help their experience learning the language 
> more than cryptic and frustrating segfaults and access violations at 
> runtime, sometime far from the source of the problem.

I totally agree that pointers are hard and good static checking for them 
would help. Currently, what we try to do is obviate the need for 
pointers in most cases, and to actually forbid them in safe modules. The 
question that remains is, how many unsafe modules are necessary, and 
what liability do they entail? If there are few and not too unwieldy, 
maybe we can declare victory without constructing an escape analyzer. I 
agree if you or anyone says they don't think so. At this point, I am not 
sure, but what I can say is that it's good to reduce the need for 
pointers regardless.

Andrei