Escape analysis (full scope analysis proposal)

Michel Fortin michel.fortin at michelf.com
Sun Nov 9 05:16:41 PST 2008


On 2008-11-06 23:36:55 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> Well how about this:
> 
> int * p;
> float * q;
> if (condition) {
>      int x; p = &x;
> } else {
>      float y; q = &y;
> }
> 
> Houston, we have a problem.

I don't see a problem at all. The compiler would expand the lifetime of 
x to the outer scope, and do the same for y. Basically, the compiler 
would make it this way in the compiled code:

	int * p;
	float * q;
	int x;
	float y;
	if (condition) {
		p = &x;
	} else {
		q = &y;
	}

A good optimising compiler could also place x and y in a union to save 
some space.


> You can of course patch that little rule in a number of ways, but 
> really at the end of the day what happens only inside a function is 
> uninteresting. The main challenge is making the analysis scalable to 
> multiple functions.

Indeed. Personally, I take the case above as a simple optimisation to 
avoid unnecessary dynamic allocation of x and y when you need to extend 
variable lifetime to a broader scope part of the same function.


>> Follows that if p is outside of the local function, x needs to be 
>> allocated dynamically (just as closures currently do for each variable 
>> they use):
>> 
>>     void f(ref int* p)
>>    {
>>        int x;
>>         p = &x;
>>    }
> 
> Well this pretty much hamstrings pointers. You can take addresses of 
> things inside a function but you can't pass them around. Moreover, 
> people disliked the stealth dynamic allocation when delegates are being 
> used; you are adding more of those.

I'd like to point out that the two things people complained the most 
about regarding the automatic dynamic allocation for dynamic closures:

1.	There is no way to prevent it, to make sure there is no allocation.
2.	The compiler does allocate a lot more than necessary.

In my proposal, these two points are addressed:

1.	You can declare any variable as "scope", preventing it from being placed
	in a broader scope, preventing at the same time dynamic allocation.
2.	The compiler being aware of what arguments do and do not escape the
	scope of the called functions, it won't allocate unnecessarily.

So I think the situation would be much better.

But all this is orthogonal to having or not an escape analysis system, 
as we could choose the reverse conventions: no variable can escape its 
scope unless explicitly authorized by some new syntactic construct.


>> If you want to make sure x never escapes the memory region associated 
>> to its scope, then you can declare x as scope and get a compile-time 
>> error when assigning it to p.
>> 
>> So, in essence, the system I propose is a little simpler because 
>> pointer variables just cannot point to values coming from a region that 
>> doesn't exist in the scope the pointer is declared. The guaranty I 
>> propose is that during the whole lifetime of a pointer, it points to 
>> either a valid memory region, or null. Cyclone's approach is to forbid 
>> you from dereferencing the pointer.
>> 
>> Combine this with my proposal to not have dynamic regions and we don't 
>> need named regions anymore. Perhaps the syntax could be made simpler 
>> with region names, but technically, we don't need them as we can always 
>> go the route of saying that a pointer value is "valid within the scope 
>> of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
>> in my other examples, and I believe it is analogous to the 
>> "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
>> pervasively.
> 
> IMHO this may be made to work. I personally prefer the system in which 
> ref is safe and pointers are permissive. The system you are referring 
> to makes ref and pointer of the same power, so we could as well 
> dispense with either.

I'm not too thrilled by references. I once got a question from someone 
coming from C: what is the difference between a pointer and a reference 
in C++? I had to answer: references are pointers with a different 
syntax, no rebindability, and no possibility of being null. It seems he 
and I both agree that references are mostly a cosmetic patch to solve a 
syntactic problem. References in D aren't much different.

If we could have a unified syntax for pointers of all kinds, I think 
it'd be more convenient than having two kinds of pointers. A 
null-forbiding but rebindable pointer would be more useful in my 
opinion than the current reference concept.


> But I'd be curious what others think of it. Notice how the discussion 
> participants got reduced to you and me, and from what I saw that's not 
> a good sign.

Indeed. I'm interested in other opinions too.

But I'm under the impression that many lost track of what was being 
discussed, especially since we started referring to Cyclone which few 
are familiar with and probably few have read the paper.

One of the fears expressed at the start of the thread was about 
excessive need for annotation, but as the Cyclone paper say, with good 
defaults, you need to add scoping annotation only to a few specific 
places. (It took me some time to read the paper and start discussing 
things sanely after that, remember?) So perhaps we could get more 
people involved if we could propose a tangible syntax for it.

Or perhaps not; for advanced programmers who already understand well 
what can and cannot be done by passing pointers around, full escape 
analysis may not seem to be a so interesting gain since they've already 
adopted the right conventions to avoid most bugs it would prevent. And 
most people here who can discuss this topic with some confidence are 
not newbies to programming and don't make too much mistakes of the sort 
anymore.

Which makes me think of beginners saying pointers are hard. You've 
certainly seen beginners struggle as they learn how to correctly use 
pointers in C or C++. Making sure their program fail at compile-time, 
with an explicative error message as to why they mustn't do this or 
that, is certainly going to help their experience learning the language 
more than cryptic and frustrating segfaults and access violations at 
runtime, sometime far from the source of the problem.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Digitalmars-d mailing list