Escape analysis (full scope analysis proposal)

Wed Nov 12 07:02:02 PST 2008

Michel Fortin wrote:
> On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> said:
> So my first point is that since we have a garbage collector in D, and 
> moreover since we're likely to get one heap per thread in D2, we don't 
> need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
> the thread-local heap, 3) All the stack frames; and you can't allocate 
> other stack frames than the current one. Because none of these regions 
> require a handle to allocate into, we (A) don't need region handles.
> 
> We still have many regions. Beside the two heaps (shared, thread-local), 
> each function's stack frame, and each block within them, creates a 
> distinct memory region. But nowhere we need to know exactly which region 
> a function parameter comes from; what we need to know is which address 
> outlives which pointer, and then we can forbid assigning addresses to 
> pointers that outlive them. All we need is a relative ordering of the 
> various regions, and for that we don't need to attach *names* to the 
> regions so that you can refer explicitly to them in the syntax. Instead, 
> you could say something like "region of (x)", or "region of (*y)" and 
> that would be enough.

But how do you type then the assignment example?

void assign(int** p, int * r) { *p = *r; }

How do you reflect the requirement that r's region outlives *p's region?

But that's not even the point. Say you define some notation, such as:

void assign(int** p, int * r) if (region(r) <= region(p));

But the whole point of regions was to _simplify_ notations like the 
above into:

void assign(region R)(int*R* p, int *R r);

So although you think you simplified things by using region(symbol) 
instead of symbolic names, you complicated things. The compiler still 
needs to infer regions for each value, so it is as complicated as a 
named-regions compiler, and in addition you require the user to write 
bulkier expressions because you disallow use of symbols. So everybody is 
worse off. Note how in the example using a symbolic region the outlives 
relationship is enforced implicitly by using the same symbol name in two 
places.

I suspect there are things you can't even express without symbolic 
regions. Consider this example from Dan's slides:

struct ILst(region R1, region R2) {
     int *R1 hd;
     ILst!(R1, R2) *R2 tl;
}

This code reflects the fact that the list holds pointer to integers in 
one region, whereas the nodes themselves are in a different region. It 
would be a serious challenge to tackle that without symbolic regions, 
and simpler that won't be for anybody.

I'll insert a few more points below in this sprawling discussion.

> Which makes me think of this:
> 
>     struct A { int i; this(); }
>     ref A foo(ref A a) { return a; }
> 
>     ref A bar()
>     {
>         foo(A()).i = 1;
> 
>         ref A a = foo(A()); // illegal, ref cannot be used outside 
> function signature
>         a.i = 1;
> 
>         return foo(A()); // illegal ?
>     }

foo(A()) is illegal because ref does not bind to an rvalue.

> Also, I'd like to point out that ref (and out) being storage classes 
> somewhat hinder me from using them where it makes sense in the 
> D/Objective-C bridge, since there most functions are instanciated by 
> templates where template arguments give the type of each function 
> argument. Perhaps there should be a way to specify "ref" and "out" in 
> template arguments...

I agree. Something like that is on the list.

>>> If we could have a unified syntax for pointers of all kinds, I think 
>>> it'd be more convenient than having two kinds of pointers. A 
>>> null-forbiding but rebindable pointer would be more useful in my 
>>> opinion than the current reference concept.
>>
>> Well ref means "This function wants to modify its argument". That is a 
>> very different charter from what pointers mean. So I'm not sure how 
>> you say you'd much prefer this to that. They are not comparable.
> 
> I was under the impression that ref would be allowed as a storage class 
> for local variables. I'll say it's perfectly acceptable for function 
> arguments, but I'm less sure about function return types.

As of now, ref is not planned for local variables.

> Also, I'd still like to have a non-null pointer type, especially for 
> clarifying function sigatures. A template can do. If it was in the 
> language however it be used by more people, which would be better.

I don't grok this notion "if it's in the language it would be used by 
more people". How does that come about? Does it mean templates are at 
such a high syntactic disadvantage? Maybe we should do something about 
that then, such as replacing !() with something else :o). If we put it 
in phobos (which after integration will be usable alongside with tango) 
could it count as being in the language?

>>>> But I'd be curious what others think of it. Notice how the 
>>>> discussion participants got reduced to you and me, and from what I 
>>>> saw that's not a good sign.
>>>
>>> Indeed. I'm interested in other opinions too.
>>>
>>> But I'm under the impression that many lost track of what was being 
>>> discussed, especially since we started referring to Cyclone which few 
>>> are familiar with and probably few have read the paper.
>>
>> In my experience, when someone is interested in something, she'd make 
>> time for it. So I take that as lack of interest. And hey, since when 
>> was lack of expertise a real deterrent? :o)
> 
> As I said below, I think many people in this group are already 
> confortable with using pointers, which may explain why they're not so 
> interested. Having no one interested in something doesn't necessarly 
> mean they won't appreciate it when it comes.

That I totally agree with. It's happened a couple of times with D features.

> It does, however reduce the incitative for continuing forward. So I 
> understand why you're backing off, even if it displease me somewhat.

I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
seem to strongly believe you can make some nice simplified regions work, 
and make people like them. Taking that to a proof is hard. The conundrum 
is, you are facing the prospect of putting work into it and creating a 
system that, albeit correct, is not enticing.

>>> One of the fears expressed at the start of the thread was about 
>>> excessive need for annotation, but as the Cyclone paper say, with 
>>> good defaults, you need to add scoping annotation only to a few 
>>> specific places. (It took me some time to read the paper and start 
>>> discussing things sanely after that, remember?) So perhaps we could 
>>> get more people involved if we could propose a tangible syntax for it.
>>
>> To be very frank, I think we are very far from having an actual 
>> proposal, and syntax is of very low priority now if you want to put 
>> one together. Right now what we have is a few vague ideas and 
>> conjectures (e.g., there's no need for named regions because the need 
>> would be rare enough to require dynamic allocation for those cases). 
>> I'm not saying that to criticize, but merely to underline the 
>> difficulties.
> 
> I never said the need for dynamic regions would be rare: I said garbage 
> collector obsoletes it. If we can justify the need for dynamic regions 
> later, we can add them back (with all the added complexity it requires) 
> but I'd try without them first.

Let's not forget that symbolic regions (for typing purposes) should not 
be confused with dynamic regions (for efficiency purposes). I agree we 
can do away with the latter and put them in later if we care. I disagree 
that dropping symbolic regions simplifies things.

>>> Or perhaps not; for advanced programmers who already understand well 
>>> what can and cannot be done by passing pointers around, full escape 
>>> analysis may not seem to be a so interesting gain since they've 
>>> already adopted the right conventions to avoid most bugs it would 
>>> prevent. And most people here who can discuss this topic with some 
>>> confidence are not newbies to programming and don't make too much 
>>> mistakes of the sort anymore.
>>>
>>> Which makes me think of beginners saying pointers are hard. You've 
>>> certainly seen beginners struggle as they learn how to correctly use 
>>> pointers in C or C++. Making sure their program fail at compile-time, 
>>> with an explicative error message as to why they mustn't do this or 
>>> that, is certainly going to help their experience learning the 
>>> language more than cryptic and frustrating segfaults and access 
>>> violations at runtime, sometime far from the source of the problem.
>>
>> I totally agree that pointers are hard and good static checking for 
>> them would help. Currently, what we try to do is obviate the need for 
>> pointers in most cases, and to actually forbid them in safe modules.
> 
> But dynamic arrays *are* pointers, how are you oblivating the need for 
> them? If you find a solution for dynamic arrays, you'll have a solution 
> for pointers too.
> 
> You could forbid dynamic arrays from refering to stack-allocated static 
> ones, or automatically dynamically allocate those when they escape in a 
> dynamic array. And if I were you, whatever you choose for arrays I'd 
> allow it for pointers too, to keep things consistent. Pointer to heap 
> objects should be retained in my opinion.

But a possible path is to make arrays safe and leave pointers for those 
cases in which efficiency is of utmost importance. With luck, those 
cases are rare.

>> The question that remains is, how many unsafe modules are necessary, 
>> and what liability do they entail? If there are few and not too 
>> unwieldy, maybe we can declare victory without constructing an escape 
>> analyzer. I agree if you or anyone says they don't think so. At this 
>> point, I am not sure, but what I can say is that it's good to reduce 
>> the need for pointers regardless.
> 
> But are you reducing the need for pointers or hiding and restricting 
> them?

Of course - that's the whole point. In fact, I'll insert a small 
correction: we are reducing the need for pointers BY hiding and 
restricting them. And that's a good thing. If you can do most of your 
work with restricted pointers (e.g. ref), then that's a net win.

Andrei