Escape analysis (full scope analysis proposal)

Thu Oct 30 05:14:31 PDT 2008

On 2008-10-29 15:10:00 -0400, "Robert Jacques" <sandford at jhu.edu> said:

> On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
> <michel.fortin at michelf.com> wrote:
> 
>> Basically, all the scope variables you can get are guarentied to be in  
>> the current or in some ansestry scope. To allow a reference to a scope  
>> variable, or a scope function, to be put inside a member of a struct or 
>>  class, you only need to prove that the struct or class lifetime is  
>> smaller or equal to the one of the reference to your scope variable. If 
>>  you could tell to the compiler the scope relationship of the various  
>> arguments, then you'd have pretty good scope analysis.
>> 
>> For instance, with this syntax, we could define i to be available 
>> during  the whole lifetime of o:
>> 
>> 	void foo(scope MyObject o, scope(o) int* i)
>> 	{
>> 		o.i = i;
>> 	}
> 
> What does the scope part of 'scope MyObject o' mean? (i.e. is this D's  
> current scope or something else?)

Ok, I should have defined that better. It means that o is bound the 
caller scope (possibly on the stack). Scopes are created for each 
function and each {}-delimited blocks in them, basically it's the stack 
of the current thread. Once you exit a scope, its variables cease to 
exist and we must ensure there is no more reference to them.

In this case, "scope MyObject o" means that we're recieving a MyObject 
reference which could be pointing to somewhere down in the stack *or* 
the heap. We have to consider the most restrictive constrain however, 
so let's say it's in the stack. The rule is that you can't place a 
reference to a scoped variable anywhere below its scope in the stack, 
making sure that you can't keep a reference to a variable which no 
longer exist once the top scope has dissapeared.

Scope stack (call stack with the global scope at the bottom):
 1. foo ( scope MyObject o = function1.o ) { }
 2. function1 () { scope MyObject o, int i }
 3. main () { }
 ...
 n. global scope

In practical terms, "scope MyObject o" means that we can't put a 
reference to the object anywhere that lives beyond the current function 
call... except in a scope return value, but I haven't entered that yet.

> What does 'scope(o)' explicitly mean? I'm going to assume scope(o) 
> means  the scope of o.

That's it... mostly. scope(o) is the scope of o, or any scope below o. 
Take it as any scope valid as long as o exists. If o was not scope, 
scope(o) would be noscope.

>> So you could do:
>> 
>> 	void bar()
>> 	{
>> 		scope int i;
>> 		scope MyObject o = new MyObject;
>> 		foo(o, &i);
>> 	}
>> 
>> And the compiler would let it pass because foo guarenties not to keep  
>> references to i outside of o's scope, and o's scope is the same as i.
>> 
>> Or you could do:
>> 
>> 	void test1()
>> 	{
>> 		int i;
>> 		test2(&i);
>> 	}
>> 
>> 	void test2(scope int* i)
>> 	{
>> 		scope o = new MyObject;
>> 		foo(o, &i);
> Error: &i is of type int** while foo takes a int*. Did you mean foo(o, i)?
Oops. Indeed, I meant foo(o, i).

>> 	}
>> 
>> Again, the compiler can statically check that test2 won't keep a  
>> reference to i outside of the caller's scope (test1) because o scope is 
>>  limited to test2.
> 
> The way I read your example, no useful escape analysis can be done by 
> the  complier, and it works mainly because i is a pointer to a value 
> type.

It's not escape analysis. It scoping constrains enforced by making sure 
that every function declares what may escape and what may not.

If this was a pure value type passed by copy, scope would be 
meaningless indeed as there would be no reference that could escape.

>> And if you try the reverse:
>> 
>> 	void test1()
>> 	{
>> 		scope o = new MyObject;
>> 		test2(o);
>> 	}
>> 
>> 	void test2(scope MyObject o)
>> 	{
>> 		int i;
>> 		foo(o, &i);
>> 	}
>> 
>> Then the compiler could determine automatically that i needs to escape  
>> test2's scope and allocate the variable on the heap to make its 
>> lifetime  as long as the object's scope (as it does currently with 
>> nested  functions) [see my reserves to this in post scriptum]. This 
>> could be  avoided by explictly binding i to the current scope, in which 
>> case the  compiler could issue a scope error:
> 
> The way I read this is o is of type scope MyObject, i is of type scope 
> int  and therefore foo(o,&i) is valid and an escape happens.

That's my point. The compiler can detect an escape may happen just by 
looking at the funciton prototype for foo. The prototype tells us that 
foo needs i to be at the same or a lower scope than o, something we 
don't have here.

The compiler can then decide to allocate i dynamically on the heap to 
make sure it exists for at least the scope of o; or it could be decided 
to just make that illegal. I prefer automatic heap allocation, as it 
means we can get rid of the decision to statically or dynamically 
allocate variables: the compiler can decide based on the funciton 
prototypes whichever is best. For cases you really mean a variable to 
be on the stack, you can use scope, as in:

	scope int i;

and the compiler would just issue an error if you attept to give a 
reference to i to a function that wants to use it in a lower scope. 
Otherwise, the compiler would be free to decide whichever scope to use 
between local or heap-allocated.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/