A possible solution for the opIndexXxxAssign morass

Wed Oct 14 13:14:31 PDT 2009

On Wed, Oct 14, 2009 at 9:34 AM, Bill Baxter <wbaxter at gmail.com> wrote:
> On Wed, Oct 14, 2009 at 7:42 AM, Jason House
> <jason.james.house at gmail.com> wrote:
>> Andrei Alexandrescu Wrote:
>>
>>> Right now we're in trouble with operators: opIndex and opIndexAssign
>>> don't seem to be up to snuff because they don't catch operations like
>>>
>>> a[b] += c;
>>>
>>> with reasonable expressiveness and efficiency.
>>
>> I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
>> 1. Look up a value by index
>> 2. Mutate the value
>> 3. Store the result
>
> And as Chad J reminds us, same goes for in-place property mutations
> like  a.b += c.
> It's just a matter of  accessing  .b  vs .opIndex(b).   And really
> same goes for any function  a.memfun(b) += c could benefit from the
> same thing (a.length(index)+=3 anyone?)
>
>> it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
>
> It could also be done using a template thing to inject the "mutate the
> value" operation:
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing idx) {
>     ref v = <lookup [idx] however you like>
>     mixin("v "~Op~" c;");
>     <store to v to [idx] however you like>
> }
>
> or make it an alias function argument and use Op(v, b).
>
> Sparse matrices are a good case to look at for issues.  a[b] is
> defined for every [b], but if the value is zero nothing is actually
> stored.  So there may or may not be something you can return a
> reference to.   In C++ things like std::map just declare that if you
> try to access a value that isn't there, it gets created.  That way
> operator[] can always return a reference.   It would be great if we
> could make a[b] not force a ref return in cases where there is no
> lvalue that corresponds to the index (or property) being accessed.
> Gracefully degrade to the slow path in those cases.
>
> A good thing about a template is you can pretty easily specify which
> cases to allow using template constraints:
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing b)
>       if (Op in "+= -=")
> {
>   ...
> }
>
> (+ 1 small dream there about 'in' being defined to mean substring
> search for string arguments -- that doesn't currently work does it?)
>
> If the template can't be instantiated for the particular operation,
> then the compiler would try to revert to the less efficient standby:
> auto tmp = a[b];
> tmp op= c;
> a[b] = tmp;
>
> The whole thing can generalize to all accessors too.  Instead of just
> passing the Op, the compiler could pass the accessor string, and args
> for that accessor.  Here an accessor means ".opIndex(b)",  ".foo", or
> even a general ".memfun(b)"
>
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Member, string
> Op)(Thang c, Thing b)
>   if (Member in ".foo() .bar() .opIndex()")
> {
>     string call = ctReplace(Member, "()", "(b)");  // Member looks
> like ".memfun()"  this turns it into ".memfun(b)"
>     ref v = mixin("this" ~ call ~ ";");
>     < any extra stuff you want to do on accesses to v >
>     mixin("v "~Op~" c;");
>     < store v back to member >
> }
>
> It's ugly and perhaps too low-level, but that can be worked on if the
> general principle is sound.   Utility functions can be defined to do
> whatever it is that turns out to be a recurring pattern.  Lack of
> being virtual could be a problem for classes.

After mulling over it some more, basically what I'm describing is
simply a function that gives the user a chance to rewrite the AST of
these kinds of ".memfun(args) op= " type operations.

When the compiler sees     "obj.memfun(b) += c"
It gives that bit of the syntax tree to the AST manipulator function
(if obj defines one) and the function can then alter it however
desired.

This is made somewhat clunky by the fact that our only representation
for ASTs is strings.

Actually this could just be a CTFE function.   It doesn't need to be a template.

Just imagine there's a compile-time struct passed in that could do
things like this:

string opWhateverAssign(AST syntax)
{
     // First some examples:
     // Assume obj.memfun(b0,b1) += c   is what appeared in source code.
     enum s0 = syntax.args;  // yields "b0, b1" -- compiler knows args
to this fn are called "b0" and "b1"
     enum s1 = syntax.args[0]; // yields "b0"
     enum s2 = syntax.rvalue; // yields "c"
     enum s3 = syntax.member;  // yields  "memfun"
     enum s4 = syntax.formatCallString("v = $syntax.member( x,y )");
// yields "v=memfun(x,y)"
     enum s5 = syntax.defaultImpl; // yields "auto v=memfun(b0,b1);
v+=c; memfun(b0,b1)=v;"

     // ok now I'll actually do something
     static if (syntax.member == "opIndex") {
           // say this is a sparse matrix class
           return ctFormat(q{
                   if (!this.matrix_contains($syntax.args)) {
                          this.create_entry($syntax.args);
                   }
                   auto v = &this.matrix_storage[$syntax.args];
                   *v  $syntax.op  $syntax.rvalue;
            });
     }
     else {
           return syntax.defaultImpl;
      }
}

This assumes we can have CTFE functions inside structs/classes.  It
assumes a function called ctFormat that can format a string at
compile-time and do perl-like variable interpolation.   It assumes we
can pass structs to CTFE functions and use them there.

Really it doesn't have to be just the opAssign type calls either.  You
could allow such interceptors for any method call or member access.

This is really close to a nemerle-like macro, actually.  Just modify 4
lines and it is one.

macro opWhateverAssign(AST syntax)
{
     // First some examples:
     // ok now I'll actually do something
     static if (syntax.member == "opIndex") {
           // say this is a sparse matrix class
           <[
                   if (!this.matrix_contains($syntax.args)) {
                          this.create_entry($syntax.args);
                   }
                   auto v = &this.matrix_storage[$syntax.args];
                   *v  $syntax.op  $syntax.rvalue;
            ]>
     }
     else {
           <[ $syntax.defaultImpl; ]>
      }
}

And this really makes me think it's silly to put off macro syntax till
D3.  Everything needed is basically there.  In contrast to a new
paradigm to reinvent parallel computing.

--bb