Property rewriting; I feel it's important. Is there still time?

Wed Mar 10 19:14:21 PST 2010

Andrei Alexandrescu wrote:
> On 03/09/2010 09:48 PM, Chad J wrote:
>> I speak of the property rewriting where an expression like
>>
>>      foo.prop++;
>>
>> is rewritten as
>>
>>      auto t = foo.prop();
>>      t++;
>>      foo.prop(t);
> 
> This particular example has a number of issues. First off you need to
> rewrite expressions, not statements. Consider:
> 
> auto x = foo.prop++;
> 
> You'd need to assign to x the old value of foo.prop. So one correct
> rewrite is
> 
> foo.prop++
> 
> into
> 
> {auto t = foo.prop; auto t1 = t; ++t1; foo.prop = t1; return t;}()
> 
> within an rvalue context, and into:
> 
> {auto t = foo.prop; ++t; foo.prop = t; return t;}()
> 
> within a void context.
> 
> I'm pointing out that things may not always be very simple, but
> generally it's easy to figure out the proper rewrites if attention is
> given to detail.
> 

Right.  This one made itself easy to notice because if you either return
a value in a void context (ex: expression statements) or fail to return
in a non-void context (ex: conditions for if/for/while statements and
the like) then further execution of semantic analysis will error.

What I end up doing is generating a bunch of comma expressions that hold
the rewritten property expression.  So my rewrite for "auto x =
foo.prop++;" actually looks like this:

auto x = (auto t = foo.prop, (auto t1 = t++, (foo.prop = t, t1)));

It's illegal D code, but only because of a check (in
Expression->semantic() somewhere IIRC) that prevents declaration
expressions from appearing in arbitrary places.  Once you're  past that
check you can put them there and the backend knows what to do with them.
 I stick t1 in there at the end to make the comma expression evaluate to
the value of t1 at the end of the calculations.  If it's a void context,
I don't stick t1 in there at the end, because if I did then it would
complain about having no side-effects.

>> So, Walter or Andrei or someone on the planning behind the scenes,
>> please lend me your thoughts:
>> How much time is left to make this sort of thing happen?
>> If a working patch for this showed up, would it have a reasonable chance
>> of acceptance at this point?
> 
> The idea is sensible and is already in effect for the ".length" property
> of arrays.
> 
>> I really want to make this happen, even if I have to pay someone to do
>> it or finish it off.  It's very close but I have almost nil free time
>> for this stuff.
>>
>> Note that I have made it work and seen it in action.  There'd be a patch
>> two months ago if I hadn't decided to rebel against the way DMD did
>> things*.
> 
> Probably offering payment wouldn't be much of an enticement, but
> lobbying reasonable ideas here is a good way to go.
> 

I figure it might give an edge of motivation, especially to some of the
talented college students around here.  I would have probably done this
kind of thing in college if the opportunity had popped up.  I think
Spring break is about here too.

>> ...
> 
>> - Having property rewrites allows the special case for "array.length +=
>> foo;" to be removed.  Property rewriting is the more general solution
>> that will work for all properties and in arbitrary expressions.
> 
> Agreed. By the way, I'm a huge fan of lowering; I think they are great
> for defining semantics in a principled way without a large language
> core. In recent times Walter has increasingly relied on lowerings and
> mentioned to me that the code savings in the compiler have been
> considerable.
> 

Interesting.

>> - By treating opIndex and opIndexAssign as properties then that pair
>> alone will make cases like "a[i]++;" work correctly without the need for
>> opIndexUnary overloads.  Also "a[i] += foo" will work too, as well as
>> anything else you haven't thought of yet.
> 
> Well operator overloading handles indexing differently, and arguably
> better than in your proposal. Ideally we'd define operators on
> properties in a manner similar to the way indexing works in the new
> operator overloading scheme. I'll talk to Walter about that.
> 
> 
> Andrei

I wouldn't want to have to define functions for side-effectful operators
/in addition/ to the getter and setter.  The opIndexUnary/
opIndexOpAssign things have bugged me a bit because I've felt that the
value returned from opIndex should handle its own operator overloads.  I
wonder if we are talking about two different things.

The extra opIndexUnary/opIndexOpAssign overloads could supersede the
behavior of getting from opIndex, mutating a temporary, and calling
opIndexAssign with the temporary.  I'd still like to not /need/ to
define the extra operator overloads though.

Indexing seems to be the general case of properties: an indexed
expression can be a getter/setter pair identified by both an identifier
(the property's name: opIndex in this case) and some runtime variables
(the indices).  The properties are a getter/setter pair identified by
only the property's name alone.  This isn't much harder to deal with:

    foo[i]++;

->

    {auto t = foo.opIndex(i);
     t++;
     foo.opIndex(i,t) }()

Now if the index itself has side effects, then that expression must be
removed:

    foo[i++]++;

->

    {auto t = foo.opIndex(i);
     t++;
     foo.opIndexAssign(i,t)
     i++; }() // i++ is removed from the indexing expression.

I think I've managed to successfully deal with that.

I've also given thought to the notion of side-effects within
side-effects, and I make sure those are safely removed so that things
don't get executed twice or more in an unexpected manner.

And... I also handled out and ref parameters in function calls.  A
property found used as a ref argument is extracted from the call and
replaced with a temporary that is get and set.  I feel that out
parameters are similar to assignment, so a property found as an out
argument will only have its setter called.

I just need to get the blasted thing to mesh with dmd's manner of
travelling the AST ;)