Escape analysis (full scope analysis proposal)
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Wed Nov 12 21:53:50 PST 2008
Michel Fortin wrote:
> On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> Michel Fortin wrote:
>>> On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> said:
>>> So my first point is that since we have a garbage collector in D, and
>>> moreover since we're likely to get one heap per thread in D2, we
>>> don't need dynamic regions. The remaining regions are: 1) the shared
>>> heap, 2) the thread-local heap, 3) All the stack frames; and you
>>> can't allocate other stack frames than the current one. Because none
>>> of these regions require a handle to allocate into, we (A) don't need
>>> region handles.
>>>
>>> We still have many regions. Beside the two heaps (shared,
>>> thread-local), each function's stack frame, and each block within
>>> them, creates a distinct memory region. But nowhere we need to know
>>> exactly which region a function parameter comes from; what we need to
>>> know is which address outlives which pointer, and then we can forbid
>>> assigning addresses to pointers that outlive them. All we need is a
>>> relative ordering of the various regions, and for that we don't need
>>> to attach *names* to the regions so that you can refer explicitly to
>>> them in the syntax. Instead, you could say something like "region of
>>> (x)", or "region of (*y)" and that would be enough.
>>
>> But how do you type then the assignment example?
>>
>> void assign(int** p, int * r) { *p = *r; }
>>
>> How do you reflect the requirement that r's region outlives *p's region?
>>
>> But that's not even the point. Say you define some notation, such as:
>>
>> void assign(int** p, int * r) if (region(r) <= region(p));
>>
>> But the whole point of regions was to _simplify_ notations like the
>> above into:
>>
>> void assign(region R)(int*R* p, int *R r);
>>
>> So although you think you simplified things by using region(symbol)
>> instead of symbolic names, you complicated things. The compiler still
>> needs to infer regions for each value, so it is as complicated as a
>> named-regions compiler, and in addition you require the user to write
>> bulkier expressions because you disallow use of symbols. So everybody
>> is worse off. Note how in the example using a symbolic region the
>> outlives relationship is enforced implicitly by using the same symbol
>> name in two places.
>
> Everywhere I said there was no need for named regions, I also said named
> regions could be kept to ease the syntax. That said, I'm not so sure
> named regions are that good at simplifying the syntax. In your assign
> example above, the named-region version has an error: it forces the two
> pointers to be of the same region. That could be fine, but, assuming
> you're assigning to *p, it'd be more precise to write it like that:
>
> void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);
No, the code is correct as written (without the if). You may want to
reread the paper with an eye for region subtyping rules. This partly
backs up my point: understanding region analysis may be quite a burden
for the average programmer. Even you, who took pains to think through
everything and absorb the paper, are having trouble. And me too to be
honest :o).
> Once we get there, I think the no-named region syntax is better.
This is invalidated by the wrong assertion above.
> That
> said, for the swap example, where both values need to share the same
> region, the named region notation is simpler:
>
> void swap(region R)(int*R a, int*R b);
> void swap(int* a, int* b) if (region(a) == region(b));
No, for that swap there is no need to specify any region. You can swap
ints in any two regions. Probably you meant to use int** throughout.
> But I'd argue that most of the time regions do not need to be equal, but
> are subset or superset of each other, so reusing variable names makes
> more sense in my opinion.
Don't forget that using a region name twice may actually work with two
different regions, so far as they are in a subtyping relationship.
Region subtyping is key to both simplifying code and to understanding
code after simplification.
> In any case, I prefer a notation where regions constrains are attached
> directly to the type instead of being expressed somewhere else.
> Something like this (explained below):
>
> void assign(int*(r)* p, int* r) { *p = r; }
> void swap(ref int*(b) a, ref int*(a) b);
Sure. I'm sure there's understanding that that doesn't make anything any
simpler or any easier to implement or understand. It's just a minor
change in notation, and IMHO not to the better.
> Here, a parenthesis suffix after a pointer indicates the region
> constrain of the pointer, based on the region of another pointer.
I thought it means pointer to function. Oops.
> In the
> first example, int*(r)* means that the integer pointer "*p" must not
> live beyond the value pointed by "r" (because we're going to assign "r"
> to "*p"). In the second example, the value pointed by "a" must not live
> longer than the one pointed by "b" and the value pointed by "b" must not
> live longer than the one pointed "a"; the net result is that they must
> have the same lifetime and need to be in the same region.
>
> For something more complicated, you could give multiple commas-separated
> constrains:
>
> void choose(ref int*(a,b) result, int* a, int* b)
> {
> result = rand() > 0.5 ? a : b;
> }
This all is irrelevant. You essentially change the syntax. Syntax is,
again, the least of the problems to be solved.
>> I suspect there are things you can't even express without symbolic
>> regions. Consider this example from Dan's slides:
>>
>> struct ILst(region R1, region R2) {
>> int *R1 hd;
>> ILst!(R1, R2) *R2 tl;
>> }
>>
>> This code reflects the fact that the list holds pointer to integers in
>> one region, whereas the nodes themselves are in a different region. It
>> would be a serious challenge to tackle that without symbolic regions,
>> and simpler that won't be for anybody.
>
> Today's templates are just fine for that. Just propagate variables
> through template arguments and apply region constrains to the members:
>
> struct ILst(alias var1, alias var2) {
> int*(var1) hd;
> ILst!(var1, var2)*(var2) tl;
> }
>
> int z;
> int*(z) a, b;
> ILst!(a, b) lst1;
> ILst!(&z, &z) lst2;
I hope you agree that this is just written symbols without much meaning.
This is not half-baked. It's not even rare. The cow is still moving. I
can't eat that! :o) I can't even start replying to it because there are
so many actual and potential issues, I'd need to get to work on them first.
> We could even allow regions to propagate through type arguments too:
>
> struct ILst2(T1, T2) {
> int*(T1) hd;
> ILst2!(T1, T2)*(T2) tl;
> }
> ILst2!(typeof(&z), typeof(b)) lst3;
>
> I think this example is a good case for attaching region constrains
> directly to types instead of expressing them as conditional expressions
> elsewhere, as in "if (region a <= region b)".
I am thoroughly lost here, sorry. I can't even answer "this is so wrong"
or "this is pure genius". Probably it's somewhere in between :o). At any
rate, I suggest you develop a solid understanding of Cyclone if you want
to build something related to it.
[In the interest of coherence I snipped away unrelated parts of the
discussion.]
>> I'm sorry about how you feel. Now we're in a conundrum of sorts. You
>> seem to strongly believe you can make some nice simplified regions
>> work, and make people like them. Taking that to a proof is hard. The
>> conundrum is, you are facing the prospect of putting work into it and
>> creating a system that, albeit correct, is not enticing.
>
> Currently, I'm just trying to convince you (and any other potential
> silent listeners) that it can work.
I understand I've been blunt throughout this post, but please side with
me for a minute. I'm doing so for the following reasons: (a) I'm
essentially writing this post in negative time; (b) I believe you
currently don't have an attack on the problem you're trying to solve;
(c) I believe it's worthwhile for you to develop an attack on the
problem, (d) I think "we" = "the D community" should seriously consider
safety and consequently things like region analysis.
You can now stop siding with me and side again with yourself. At this
point you can easily guess that all of the above was to prepare you for
an even blunter comment. Here goes.
You say you want to convince people "it can work". But right now there
is no "it". You have no "it". Much less an "it" that can work.
But there is of course good hope that an "it" could emerge, and I
encourage you to continue working towards that goal. It's just a lot
more work than it might appear.
Andrei
More information about the Digitalmars-d
mailing list