Escape analysis (full scope analysis proposal)

Fri Oct 31 20:13:50 PDT 2008

On 2008-10-31 11:11:26 -0400, "Steven Schveighoffer" 
<schveiguy at yahoo.com> said:

> "Michel Fortin" wrote
>> Basically, by documenting better the interfaces in a machine-readable way,
>> we are freed of other burdens the compiler can now take care of. In
>> addition, we have better defined interfaces and the compiler has a lot
>> more room to optimize things.
> 
> But the burden you have left for the developer is a tough one.  You have to
> analyze the inputs and function calls from a function and determine which
> variable depends on what.  This is a perfect problem for a tool to solve.
> 
>> The problem is that as soon as you have a function declaration without the
>> body, the lint tool won't be able to tell you if it escapes or not.
> 
> This I agree is a problem.  In fact, without specifications in the function
> things like interfaces would be very difficult to determine scope-ness at
> compile time.

If you can't determine yourself that a function can work with scoped 
parameters, you'd better never call that function with reference to 
local variables and leave its prototype with noscope parameters, making 
the compiler aware of the situation.

In any case, the one who design the function is the one who is most 
likely able to tell you whether or not it accepts scoped arguments. The 
current situation makes the caller of that function responsible of 
calling it correctly. I think that's backward.

> The only way I can see to solve this is to do it at link time.  When you
> link, piece together the parts of the graph that were incomplete, and see if
> they all work.  It would be a very radical change, and might not even work
> with the current linkers.  Especially if you want to do shared libraries,
> where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, 
but that's probably not even possible.

> A related question: how do you handle C functions?

You read the documentation of the function to determine if the function 
will let the pointer escape somewhere, and if not declare the parameter 
scope. For instance:

	extern (C)
	void printf(scope char* format, scope...);

By the way, extern (C) functions with noscope parameters need careful 
consideration since their pointers aren't tracked by the garbage 
collector.

>> So, without a way to specify the requested scope of the parameters, you'll
>> very often have holes in your escape analysis that will propagate down the
>> caller chain, preventing any useful conclusion.
> 
> Yes, and if a function has mis-specified some of its parameters, then you
> have code that doesn't compile.  Or the function itself won't compile, and
> you need to do some more manual analysis.  Imagine a function that calls 5
> or 6 other functions with its parameters.  And there are multiple different
> dependencies you have to resolve.  That's a lot of analysis you have to do
> manually.

You'll get an error at some call site, which can mean only two things: 
either your local variable shouldn't be bound to the local scope 
(because the function expects a reference it can keep beyond its scope) 
so you should allocate it on the heap, or the function you're calling 
has its prototype wrong.

There's a chance that fixing the function prototype will create 
problems upward if it tries to put a reference to a scope variable in a 
global, or pass it to a function as a noscope argument.

>> I don't think it's bad to force interfaces to be well documented, and
>> documented in a format that the compiler can understand to find errors
>> like this.
> 
> I think this concept is going to be really hard for a person to decipher,
> and really hard to get right.

It takes some thinking to get the prototype right at first. But it 
takes less caution calling the function later with local variables 
since the compiler will either issue an error or automatically fix the 
issue by allocating on the heap when an argument requires a greater 
scope.

> We are talking about a graph dependency
> analysis, in which many edges can exist, and the vertices do not necessarily
> have to be parameters.  This is not stuff for the meager developer looking
> to get work done to have to think about.  I'd much rather have a tool that
> does it, if not the compiler, then something else.  Or partial analysis.  Or
> no analysis.  I agree it's good to have bugs caught by the compiler, but
> this solution requires too much work from the developer to be used.
> 
> Some fun puzzles for you to come up with a proper scope syntax to use:
> 
> void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

	void f(scope ref int *a, scopeof(a) int *b, scopeof(o) int *c)
	{
		if (*b < *c) a = b; else a = c;
	}

> struct S
> {
>    int *v;
> }
> 
> int *f2(S* s) { return s.v;}

Here you have two options depending on what you mean. Your example 
above is valid, but would allow v to point only to heap variables. If 
your intension is that S.v should be able to refer to scope variables 
too, then you'd need to write S as:

	struct S
	{
		scope int *v;
	}

Then, no function can copy this pointer and keep it beyond of the scope 
of S. Therfore, the function needs to be updated to propagate this 
property:

	scopeof(s) int *f2(scope S* s) { return s.v; }

> void f3(ref int *a, ref int *b, ref int *c)
> {
>    int *tmp = a;
>    a = b; b = c; c = tmp;
> }

This one is special, because you have a circular reference between the 
parameters. Note that a simpler example of this would be swapping two 
values. I had to invent something here saying that all these variables 
share the same scope... but I'd agree the syntax isn't so good.

	void f3(ref scope(1) int *a, ref scope(1) int *b, ref scope(1) int *c)
	{
		scope int *tmp = a;
		a = b; b = c; c = tmp;
	}

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/