Escape analysis (full scope analysis proposal)

Wed Nov 12 05:47:22 PST 2008

On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> Michel Fortin wrote:
>> I'd like to point out that the two things people complained the most 
>> about regarding the automatic dynamic allocation for dynamic closures:
>> 
>> 1.    There is no way to prevent it, to make sure there is no allocation.
>> 2.    The compiler does allocate a lot more than necessary.
>> 
>> In my proposal, these two points are addressed:
>> 
>> 1.    You can declare any variable as "scope", preventing it from being placed
>>     in a broader scope, preventing at the same time dynamic allocation.
>> 2.    The compiler being aware of what arguments do and do not escape the
>>     scope of the called functions, it won't allocate unnecessarily.
>> 
>> So I think the situation would be much better.
> 
> I agree that an escape analyzer would improve things. I am not sure 
> that one oblivious to regions is expressive enough.

If you think I proposed a region-oblivious scheme, then you've got me 
wrong (and perhaps it's my fault for not explaining well enough). Let 
me explain again, and I'll try to not skip anything this time.

Cyclone has dynamic regions, regions which are allocated on the heap 
but that are deleted at the end of the scope that created them. 
Basically, those are scoped heaps offering a very useful system to 
automatically free memory. (It's somewhat similar in concept to Cocoa's 
NSAutoReleasePool for instance.) The downside of them is that you need 
to pass region handle around (so called functions can allocate objects 
within them).

So my first point is that since we have a garbage collector in D, and 
moreover since we're likely to get one heap per thread in D2, we don't 
need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
the thread-local heap, 3) All the stack frames; and you can't allocate 
other stack frames than the current one. Because none of these regions 
require a handle to allocate into, we (A) don't need region handles.

We still have many regions. Beside the two heaps (shared, 
thread-local), each function's stack frame, and each block within them, 
creates a distinct memory region. But nowhere we need to know exactly 
which region a function parameter comes from; what we need to know is 
which address outlives which pointer, and then we can forbid assigning 
addresses to pointers that outlive them. All we need is a relative 
ordering of the various regions, and for that we don't need to attach 
*names* to the regions so that you can refer explicitly to them in the 
syntax. Instead, you could say something like "region of (x)", or 
"region of (*y)" and that would be enough.

So there is still a region for every pointer, only regions don't need 
to be *named* because you can always refer to them by referring to the 
variables. (And perhaps the syntax would be clearer with region names 
than without, in which case I don't mind we use them. But they're not 
required for the concept to work.)

>> I'm not too thrilled by references. I once got a question from someone 
>> coming from C: what is the difference between a pointer and a reference 
>> in C++? I had to answer: references are pointers with a different 
>> syntax, no rebindability, and no possibility of being null. It seems he 
>> and I both agree that references are mostly a cosmetic patch to solve a 
>> syntactic problem. References in D aren't much different.
> 
> I disagree. References in D are very different. They are not type 
> constructors. They are storage classes that can only be used in 
> function signatures, which makes them impossible to dangle. I think C++ 
> references would also have been much better off as storage classes 
> instead of half-life types.

Which makes me think of this:

	struct A { int i; this(); }
	ref A foo(ref A a) { return a; }

	ref A bar()
	{
		foo(A()).i = 1;

		ref A a = foo(A()); // illegal, ref cannot be used outside function signature
		a.i = 1;

		return foo(A()); // illegal ?
	}

Also, I'd like to point out that ref (and out) being storage classes 
somewhat hinder me from using them where it makes sense in the 
D/Objective-C bridge, since there most functions are instanciated by 
templates where template arguments give the type of each function 
argument. Perhaps there should be a way to specify "ref" and "out" in 
template arguments...

>> If we could have a unified syntax for pointers of all kinds, I think 
>> it'd be more convenient than having two kinds of pointers. A 
>> null-forbiding but rebindable pointer would be more useful in my 
>> opinion than the current reference concept.
> 
> Well ref means "This function wants to modify its argument". That is a 
> very different charter from what pointers mean. So I'm not sure how you 
> say you'd much prefer this to that. They are not comparable.

I was under the impression that ref would be allowed as a storage class 
for local variables. I'll say it's perfectly acceptable for function 
arguments, but I'm less sure about function return types.

Also, I'd still like to have a non-null pointer type, especially for 
clarifying function sigatures. A template can do. If it was in the 
language however it be used by more people, which would be better.

>>> But I'd be curious what others think of it. Notice how the discussion 
>>> participants got reduced to you and me, and from what I saw that's not 
>>> a good sign.
>> 
>> Indeed. I'm interested in other opinions too.
>> 
>> But I'm under the impression that many lost track of what was being 
>> discussed, especially since we started referring to Cyclone which few 
>> are familiar with and probably few have read the paper.
> 
> In my experience, when someone is interested in something, she'd make 
> time for it. So I take that as lack of interest. And hey, since when 
> was lack of expertise a real deterrent? :o)

As I said below, I think many people in this group are already 
confortable with using pointers, which may explain why they're not so 
interested. Having no one interested in something doesn't necessarly 
mean they won't appreciate it when it comes.

It does, however reduce the incitative for continuing forward. So I 
understand why you're backing off, even if it displease me somewhat.

>> One of the fears expressed at the start of the thread was about 
>> excessive need for annotation, but as the Cyclone paper say, with good 
>> defaults, you need to add scoping annotation only to a few specific 
>> places. (It took me some time to read the paper and start discussing 
>> things sanely after that, remember?) So perhaps we could get more 
>> people involved if we could propose a tangible syntax for it.
> 
> To be very frank, I think we are very far from having an actual 
> proposal, and syntax is of very low priority now if you want to put one 
> together. Right now what we have is a few vague ideas and conjectures 
> (e.g., there's no need for named regions because the need would be rare 
> enough to require dynamic allocation for those cases). I'm not saying 
> that to criticize, but merely to underline the difficulties.

I never said the need for dynamic regions would be rare: I said garbage 
collector obsoletes it. If we can justify the need for dynamic regions 
later, we can add them back (with all the added complexity it requires) 
but I'd try without them first.

>> Or perhaps not; for advanced programmers who already understand well 
>> what can and cannot be done by passing pointers around, full escape 
>> analysis may not seem to be a so interesting gain since they've already 
>> adopted the right conventions to avoid most bugs it would prevent. And 
>> most people here who can discuss this topic with some confidence are 
>> not newbies to programming and don't make too much mistakes of the sort 
>> anymore.
>> 
>> Which makes me think of beginners saying pointers are hard. You've 
>> certainly seen beginners struggle as they learn how to correctly use 
>> pointers in C or C++. Making sure their program fail at compile-time, 
>> with an explicative error message as to why they mustn't do this or 
>> that, is certainly going to help their experience learning the language 
>> more than cryptic and frustrating segfaults and access violations at 
>> runtime, sometime far from the source of the problem.
> 
> I totally agree that pointers are hard and good static checking for 
> them would help. Currently, what we try to do is obviate the need for 
> pointers in most cases, and to actually forbid them in safe modules.

But dynamic arrays *are* pointers, how are you oblivating the need for 
them? If you find a solution for dynamic arrays, you'll have a solution 
for pointers too.

You could forbid dynamic arrays from refering to stack-allocated static 
ones, or automatically dynamically allocate those when they escape in a 
dynamic array. And if I were you, whatever you choose for arrays I'd 
allow it for pointers too, to keep things consistent. Pointer to heap 
objects should be retained in my opinion.

> The question that remains is, how many unsafe modules are necessary, 
> and what liability do they entail? If there are few and not too 
> unwieldy, maybe we can declare victory without constructing an escape 
> analyzer. I agree if you or anyone says they don't think so. At this 
> point, I am not sure, but what I can say is that it's good to reduce 
> the need for pointers regardless.

But are you reducing the need for pointers or hiding and restricting 
them? I'd say the later. Reference are pointers with restrictions. 
Object references are no different from pointer except in syntax (they 
can even point to stack allocated objects with scope classes). Dynamic 
arrays are pointers with a certain range. Closure have a pointer to a 
stack frame, which can be heap-allocated or not.

The only way to have a safe system without escape analysis is to force 
everything they can point to to be on the heap, or prevent them from 
escaping at all (as with ref). I which there could be some consistency 
here.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/