Escape analysis (full scope analysis proposal)

Wed Oct 29 04:28:55 PDT 2008

On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford at jhu.edu> said:

> I've run across some academic work on ownership types which seems 
> relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same 
direction as I was thinking.

Basically, all the scope variables you can get are guarentied to be in 
the current or in some ansestry scope. To allow a reference to a scope 
variable, or a scope function, to be put inside a member of a struct or 
class, you only need to prove that the struct or class lifetime is 
smaller or equal to the one of the reference to your scope variable. If 
you could tell to the compiler the scope relationship of the various 
arguments, then you'd have pretty good scope analysis.

For instance, with this syntax, we could define i to be available 
during the whole lifetime of o:

	void foo(scope MyObject o, scope(o) int* i)
	{
		o.i = i;
	}

So you could do:

	void bar()
	{
		scope int i;
		scope MyObject o = new MyObject;
		foo(o, &i);
	}

And the compiler would let it pass because foo guarenties not to keep 
references to i outside of o's scope, and o's scope is the same as i.

Or you could do:

	void test1()
	{
		int i;
		test2(&i);
	}

	void test2(scope int* i)
	{
		scope o = new MyObject;
		foo(o, &i);
	}

Again, the compiler can statically check that test2 won't keep a 
reference to i outside of the caller's scope (test1) because o scope is 
limited to test2.

And if you try the reverse:

	void test1()
	{
		scope o = new MyObject;
		test2(o);
	}

	void test2(scope MyObject o)
	{
		int i;
		foo(o, &i);
	}

Then the compiler could determine automatically that i needs to escape 
test2's scope and allocate the variable on the heap to make its 
lifetime as long as the object's scope (as it does currently with 
nested functions) [see my reserves to this in post scriptum]. This 
could be avoided by explictly binding i to the current scope, in which 
case the compiler could issue a scope error:

	void test2(scope MyObject o)
	{
		scope int i;
		foo(o, &i); // error, i scope needs to match o's, but i is bound to 
the current scope.
	}

Interistingly, with this scheme, assuming your function arguments are 
properly scope-labeled, you never need to allocate variables on the 
heap explicitly anymore, the compiler can take care of it for you when 
the use of the variable inside the function body requires it.

	void test3(int* i); // unscoped parameter
	void test4()
	{
		int i; // allocated on heap because calling test3 requires an 
unscoped variable.
		test3(&i);
	}

The reverse is also true: objects declared as allocated on the heap 
could be automatically rescoped as local stack variables if their use 
inside the function is limited in scope:

	void test5()
	{
		auto o = new MyObject;
		test2(o);
	}

For instance, in test3 above where o isn't declared as scope, the 
compiler could still allocate o on the stack (as long as it knows the 
constructor doesn't leave unwanted references to the object in the 
global state), because it knows from the argument declaration of test2 
that no references to o will leave the current scope.

So basically, what to heap-allocate and what to stack-allocate could be 
left entirely to the compiler's discretion.

Note that for all this to work, the pointer "i" in MyObject must be 
defined as not escaping the scope of the class:

	class MyObject
	{
		scope int* i;
	}

or else someone could take the reference and put it into a global 
variable, or a variable of a greater scope than the object.

P.S.: I'm still somewhat skeptical about this automatic allocation 
thing because it would mean a lot of extra heap allocation (and thus 
loss of performance) for any function where the parameters are not 
properly scoped. Perhaps the default should be local scope and you 
explicitly make it greater by declaring variables as noscope, which 
would allow the compiler to allocate if needed, but it doesn't solve 
the issue of the need to allocate on the heap for calling safely 
functions not using scope-labeled arguments.

P.P.S.: This syntax doesn't fit very well with the current 
scope(success/failure/exit) feature.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/