Escape Analysis on reddit

Sat Nov 1 08:19:58 PDT 2008

"Andrei Alexandrescu" wrote
> Walter plans to implement path- and flow-insensitive escape analysis.

This is good!  It is what I was hoping for as a fix for trivial examples.

> In fact he first wants to define scope to work on credit alone (trust the 
> programmer) and later on implement the analysis. I hope I'll be able to 
> talk him out of that, because it's a precisely backwards approach: he'll 
> first allow vale tudo, and then break code when he implements a limited 
> analysis. I think the right thing is to move from conservative to 
> permissive.

I agree that implementing it on credit alone is going to open up problems 
later on.

>> scope T[] findFirst(scope T[] buf, scope T target);
>>
>> scope T[] findFirst5(scope T[] buf)
>> {
>>    scope t5 = new T(5);
>>    return findFirst(buf, t5);
>> }
>
> Same thing, won't compile. This is a clear example where conservative 
> doesn't cut it.

OK, so then your proposed solution won't work for valid code like this. 
Will there be a way to force this to compile?

> There is a paper that I don't have the time to look for, that does a 
> pointer analysis and associate with each function a little graph showing 
> how the return value depends on the inputs. (Does anyone know of it?) I 
> don't think we can afford to implement anything near that level of 
> sophistication.

I think something like this isn't possible, especially using interfaces or 
function pointers, but might be possible if graph dependencies are resolved 
at link-time.  And even then, it might reject some valid code.

>> The problem is, unless you allow fully specifying dependencies, there is 
>> going to be ways people call functions that you didn't think of, which 
>> are legitimate designs, and just can't be done with your new regime.  The 
>> ultimate result is either people cast around the system, don't use the 
>> system, or end up allocating unecessarily on the heap to work around it.
>
> It all depends on how often those cases are encountered.

It only takes one common function to ruin the whole thing.  Like I said 
before, it is viral.  If one lower level function can't be scope, then 
nothing above it can be scope.  Unless there is a way to cast away the scope 
property, in which case you have compromised the system.

>>>> So yes, you are protected, but at the cost of not being able to write 
>>>> code that is provably safe.  The end result is you end up removing 
>>>> scope where it really should be, and then virally, it makes you remove 
>>>> scope decorations everywhere else.  I predict that if this is 
>>>> implemented, nobody will be able to use it.
>>> There are many cases in which statically typed programs disallow valid 
>>> constructs. Each case is a matter of nuance, but by and large types have 
>>> turned out to be successful.
>>
>> I would bet that either A) the proposed system would not provide correct 
>> protection, or B) the system would require many design changes in a 
>> library like Tango, which uses lots of stack variables and inner 
>> functions to achive higher performance.  It just doesn't work with actual 
>> already successful software programs.  It's too limited.
>
> It would be helpful to provide some examples from Tango or other libraries 
> to substantiate this.

I will find some examples.  Just not today, I've got some fall cleanup to do 
;)

>>>> I'll reiterate my challenge from another thread:
>>>>
>>>> void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}
>>>>
>>>> struct S
>>>> {
>>>>    int *v;
>>>> }
>>>>
>>>> int *f2(S* s) { return s.v;}
>>>>
>>>> void f3(ref int *a, ref int *b, ref int *c)
>>>> {
>>>>    int *tmp = a;
>>>>    a = b; b = c; c = tmp;
>>>> }
>>>>
>>>> void f4()
>>>> {
>>>>    int a=1, b=2, c=3;
>>>>    auto ap = &a;
>>>>    auto bp = &b;
>>>>    auto cp = &c;
>>>>    S s(cp);
>>>>
>>>>    f(ap, bp, cp);
>>>>    b = *f2(&s);
>>>>    f3(ap, bp, cp);
>>>> }
>>>>
>>>> Please tell me how I should mark f, S, f2, and f3 such that f4 
>>>> compiles. Note that this code all works in D1 and provably has no 
>>>> problems with escaping values, since f4 returns void, and no globals 
>>>> are accessed.
>>> First off, we need to clarify that "provably" is a bit strong, and that 
>>> the hypothesis "f4 returns void, and no globals are accessed" is not all 
>>> that's needed for the proof. Full access to function bodies is 
>>> necessary.
>>
>> OK, if f4 returns void, and takes no arguments, and none of the code used 
>> by any of the functions accesses globals (I admit, I may have said this 
>> ambiguously last time), there's not even any heap activity, you tell me 
>> how it can escape.  Prove me wrong.  You have full access to the source 
>> as you can see.
>
> That's exactly what I said. Without access to the bodies you can't tell 
> anything. And I automatically discount access to bodies. Information 
> should be inferred from function signatures.

Sure, which is why I posted the challenge.  You may have misunderstood what 
I meant.  I meant, look at this group of functions.  Having access to the 
source code, you can prove that it is valid, and no escapes occur.  Now 
remove the function bodies for everything but f4, and show me how your 
solution marks up the signatures such that f4 compiles.

I think with being able to specify the scope return value, it's probably a 
lot easier than without that.

>
>>> Correctness may be provable to a human or machine having access to the 
>>> entire body of code, but not if e.g. f, f2, and f3 are invisible to f4.
>>
>> That is the challenge.  If I remove the function bodies, how do you mark 
>> the signatures so you can still prove that it's correct?
>
> Instead of looking on an artificial example, it would be better to work on 
> a corpus of examples extracted from actual code.

OK, I'll find some (per your request above, from Tango).

>>> Second, I contest the entire example. Static typing is supposed to prove 
>>> interesting properties about *most* correct programs, at the cost of 
>>> disallowing a *few* correct programs that are not very frequent, are not 
>>> very interesting, or can be easily replaced with typesafe alternatives. 
>>> One can write correct code based on void* that is virtually impossible 
>>> to typecheck statically, but that doesn't show that static types are 
>>> useless.
>>
>> My example is at least as interesting as Walter's in the article to show 
>> the problem.  If you want to complain about simple examples not being 
>> useful, then stop using simple examples to show your point.  We are not 
>> talking about mucking with void *'s or drastic violations of types here, 
>> we are talking about reference parameters for returns, or an accessor of 
>> a member, or a 3way swap function.  These are all common designs.  The 
>> fact that you cannot find a solution so you resort to just bluntly 
>> calling my example 'uninteresting' speaks for itself.
>
> This is the second time my identity is conflated with that of Walter. I do 
> not write his blog entries! Moreover, I also think they are due for an 
> increase in quality of treatment. About two months ago I told Walter that 
> his writing is overly imprecise and glossing over a lot of detail. To me 
> it conveys the feeling that he considers the reader not too smart and not 
> too critical and just willing to accept a lot of statements at face value. 
> If I were tasked with writing on the same subjects, I'd make damn sure I 
> could stand to lose a toe for every incorrect word in there. (Newsgroup 
> standards aren't that high, hallelujah.) He rightly said in his defense 
> that a blog entry can't be an in-depth treatment for the time he can 
> afford to invest in it. My opinion of the whole banana is that if you want 
> to make an authoritative blog that's also short and sweet you must end up 
> pondering each word carefully and provide ample references for the 
> interested.

Very true, but contrived simple examples are used almost exclusively to 
illustrate a point when discussing merits and drawbacks of theoretical new 
features of the compiler.  You can't just discount it because I made it up; 
it is a valid piece of code, provably not dangerous, and has elements that 
are common in programming design.  Your rebuttal rings hollow because you 
have sidesteped the main argument in the interest of essentially name 
calling on my code.

I'll find some real examples that are like this in Tango.  If nothing else, 
to remove this argument about unimportance.

-Steve