How do I find the arity of an Expression? (dmd hacking)

Mon Nov 30 17:59:00 PST 2009

I think I'll reply to both of you in one post since the thoughts are
related.

Ary Borenszweig wrote:
> Ellery Newcomer wrote:
>>
>> Not that I know anything about DMD outside parse.c, but in
>> expression.h, there be decls along the lines of
>>
>> struct UnaExp{
>>   Expression* e1;
>> }
>> struct BinExp{
>>   Expression* e1;
>>   Expression* e2;
>> }
>>
>> Iterating them would just be a tree walk. That takes care of 90% and ...
>>
>> Oh. Yeah. There are a bunch of special cases.
>>
>> But from what I've seen, iterating over an expression or any part of
>> the AST involves implementing a function (like semantic) at each
>> subclass. I'm betting there is no other way to do this.
>>

D'oh.  Yep that's what I was noticing too.  Thanks for confirming that.

>> I love ANTLR. It generates an arbitrary child/sibling AST, and walking
>> it is a piece of cake.
> 
> It would be better if dmd's source code implemented the visitor pattern.
>  Descent's port implements it and it uses it a a lot of places.

I wonder if Walter would go for it.  Is he very receptive of purely
architectural patches for dmd?

...

If you care for a bit of reading, I'll tell you my story and ping an idea.

This is about the property expression rewrite of course.  I'd love to
just use the current convention in dmd and write the rewrite as a
non-recursive function that gets called at every point in the tree
whenever someExpr->semantic(sc) is called.  However, there's a snag.

When doing the property rewrite, the inner-most subexpression will need
to generate the outermost temporaries.

An example:

prop1.prop2 += foo;

AFAIK, the compiler sees this as ((prop1).prop2) += foo; or
((prop1()).prop2()) += foo; after resolveProperties is called.  So prop1
is the innermost expression.

It would be rewritten as this comma expression:

(auto t1 = prop1()),
(auto t2 = t1.prop2()),
(t2 += foo),
(t1.prop2(t2)),
(prop1(t1))

So t1 is the outermost temporary, and it corresponds to prop1, the
innermost expression.

This is problematic in dmd's traditional approach, because the semantic
pass on prop1 does not have access to it's enclosing expression (as far
as I can tell).  If it did, I'd just go to the root of the tree, uproot
the tree, stick it in a comma expression, and just start nesting:

CommaExp // Step 1a for the rewrite.
|
+--> auto t1 = prop1()
|
+--> CommaExp // Step 1b
.    |
.    +--> CommaExp // Step 2a
.    |    |
.    |    +--> auto t2 = t1.prop2()
.    |    |
.    |    +--> CommaExp // Step 2b
.    |         |
.    |         +--> t2 += foo; // Originally the root.
.    |         |
.    |         +--> t1.prop2(t2);
.    |
.    +--> prop1(t1)

Then as long as the innermost expression does its rewrite before its
parent, then it will work.  If it doesn't go first, well I dunno.  I
don't seem to have this option anyways.  I can't find a way to get the
root.

Perhaps adding a root expression pointer to the Scope (sc) object that
gets tossed around semantic would work, but I haven't considered the
bookkeeping involved with that.  I already started taking a different
approach before thinking of that.

My plan then is to do the rewrite after the root's expr->semantic(sc)
call.  It will probably look like expr->doPropertyRewrite(), and be
called from statement.c.  But in there will be the guts of the function
that has a signature like this:

Expression *doPropertyRewriteGreedily(
    Expression *expr,    // The current expression being rewritten.
    Expressions *prolog, // Temporary decls and getter calls go here.
    Expressions *epilog, // Setter calls go here.
    int treatAsLvalue    // Is this expression mutated by its parent?
    )

This works out pretty nice since adding the temporaries and such into
the prolog and epilog looks very natural.  The function just calls
itself to walk the AST.

The problem is that walking the AST thing.  The function has to know
about expr's children in order to recurse on them.  This is easy in some
cases like postfix ++ and --, because that pattern is represented by one
subclass of Expression: PostExp.  Same for the properties themselves
(CallExp).  But then there's the more general patterns like all other
impure expressions (+=,-=,*=,/=, etc) and all pure expressions ever
(+,-,*,/,%,&,|, pure functions, literals, etc).  In a lot of those cases
the rewrite doesn't care about analyzing the expression beyond a certain
point, or at all, and it just wants to forward the children to the next
recursion level.  That's where I hit this iteration problem and all of
the special cases.

Now for the idea.

Currently I'm planning an approach that's a bit more minimalistic than
the visitor pattern; just to solve my immediate problem.  I will define
these methods in Expression:

virtual Expression *getChild(unsigned index);
virtual void setChild(unsigned index, Expression* val);
virtual unsigned numChildren();

Then I will implement them as needed in the subclasses of Expression.  I
figure I'd have to write this logic into my function anyways, so I might
as well make this usable to others.  I've already written a draft of the
property rewrite that uses these, so I know they get the job done in
principle.  I just hope this will all be OK in the patch (or even holds
up under debugging).

So yeah, if I haven't put you to sleep already, please let me know if I
messed up somewhere.

- Chad