Short list with things to finish for D2

Thu Nov 19 07:56:50 PST 2009

Steven Schveighoffer wrote:
> On Wed, 18 Nov 2009 18:14:08 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> We're entering the finale of D2 and I want to keep a short list of 
>> things that must be done and integrated in the release. It is clearly 
>> understood by all of us that there are many things that could and 
>> probably should be done.
>>
>> 1. Currently Walter and Don are diligently fixing the problems marked 
>> on the current manuscript.
>>
>> 2. User-defined operators must be revamped. Fortunately Don already 
>> put in an important piece of functionality (opDollar). What we're 
>> looking at is a two-pronged attack motivated by Don's proposal:
>>
>> http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP7
>>
>> The two prongs are:
>>
>> * Encode operators by compile-time strings. For example, instead of 
>> the plethora of opAdd, opMul, ..., we'd have this:
>>
>> T opBinary(string op)(T rhs) { ... }
>>
>> The string is "+", "*", etc. We need to design what happens with 
>> read-modify-write operators like "+=" (should they be dispatch to a 
>> different function? etc.) and also what happens with index-and-modify 
>> operators like "[]=", "[]+=" etc. Should we go with proxies? Absorb 
>> them in opBinary? Define another dedicated method? etc.
> 
> I don't like this.  The only useful thing I can see is if you wanted to 
> write less code to do an operation on a wrapper aggregate, such as an 
> array, where you could define all binary operations with a single mixin.
> 
> Other than that, it munges together all binary operations into a single 
> function, when all those operations are different it:
> 
> 1) prevents code separation from things that are considered separately

(I'll retort inline for each point.) That's quite exactly the opposite 
of what my experience with C++ and D operator overloading suggests: most 
of the time (a) I need to overload operators in large groups, (b) I need 
to do virtually the same actions for each operator in a group.

Note that with opBinary you have unprecedented flexibility on how you 
want to group operators. Consider:

struct A {
     A opBinary(string op)(A rhs)
         if (op == "+" || op == "-" || op == "*" || op == "/" ||
             op == "^^")
     {
         ...
     }
     A opBinary(string op)(A rhs) if (op == "~")
     {
         ...
     }
     ...
}

So anyway I contend that your argument is not correct. The "if" clause 
allows you to separate code for things that are considered separately. 
So essentially you can do things with one function per operator if you 
so wanted. Correct?

> 2) makes operators non-virtual, which can be solved by a thunk, but that 
> seems like a lot of boilerplate code that will just cause bloat

Bloat of source or bloat of binary code? I don't know about the latter, 
but the former is actually nothing to worry about - it's easier to 
define an interface or a mixin to convert from the proposed approach to 
the old approach, than vice versa.

> 3) If you derive from a class that implements an operator, and you want 
> to make that operator virtual, it will be impossible

It means that base class didn't mean for that function to make the 
operator overridable. If they wanted to make it configurable, they would 
have forwarded the operator to a virtual function.

> 4) auto-generated documentation is going to *really* suck

Agreed.

> 5) you can't define operators on interfaces, or if you do, it looks 
> ridiculous (a thunk function that dispatches to the virtual methods).

interface Ridiculous {
     // Final functions in interfaces are allowed per TDPL
     Ridiculous opBinary(string op)(Ridiculous rhs) {
         return opAdd(rhs);
     }
     // Implement this
     Ridiculous opAdd(Ridiculous);
}

You can group things as you wish and combine virtual calls with string 
comparisons if that helps:

interface Ridiculous {
     // Final functions in interfaces are allowed per TDPL
     Ridiculous opArith(string op)(Ridiculous rhs)
         if (op == "+" || op == "-" || op == "*" || op == "/" ||
             op == "^^")
     {
         return opArith(op, rhs);
     }
     // Implement this
     Ridiculous opArith(string, Ridiculous);
}

> 6) implementing a new operator in a derived class is virtually 
> impossible (no pun intended).

class Base {
     Base opBinary(string op)(Base rhs) if (op == "+") {
         ...
     }
}

class Derived : Base {
     Derived opBinary(string op)(Derived rhs) if (op == "-") {
         ...
     }
}

When you do so, you retain the advantage of grouping operators together 
(I think it's most likely that Base defines operators of one kind e.g. 
arithmetic and Derived defines operators of a different kind e.g. logic 
or catenation). Add thunking as you need and you're good to go.

> I imagine that dcollections for example will be *very* hard to write 
> with this change.

I hope my arguments above convinced you to the contrary.

> Seems like you are trying to solve a very focused problem without 
> looking at the new problems your solution will cause outside that domain.

You are correct in that I'm trying to smooth things primarily for 
structs. But I'll say that the templated approach is no slouch and can 
accommodate classes with virtual functions very capably, even though it 
is a bit more work than before.

One question is whether it's more often to overload operators for 
structs vs. classes. I imagine dcollections defines catenation and 
slicing, but not the bulk of operators. But the vast majority of 
operator overloading application is with value types as far as I can tell.

> Can we do something like how opApply/ranges resolves? I.e. the compiler 
> tries doing opAdd or opMul or whatever, and if that doesn't exist, try 
> opBinary("+").

I wouldn't want to have too many layers that do essentially the same thing.

>> 3. It was mentioned in this group that if getopt() does not work in 
>> SafeD, then SafeD may as well pack and go home. I agree. We need to 
>> make it work. Three ideas discussed with Walter:
>>
>> * Allow taking addresses of locals, but in that case switch allocation 
>> from stack to heap, just like with delegates. If we only do that in 
>> SafeD, behavior will be different than with regular D. In any case, 
>> it's an inefficient proposition, particularly for getopt() which 
>> actually does not need to escape the addresses - just fills them up.
> 
> Perhaps, but getopt is probably not the poster child for optimizing 
> performance -- you most likely call it once, changing that single 
> application to use heap data isn't going to make a difference.

I agree. My fear is that getopt is only an example of a class of functions.

>> * Allow @trusted (and maybe even @safe) functions to receive addresses 
>> of locals. Statically check that they never escape an address of a 
>> parameter. I think this is very interesting because it enlarges the 
>> common ground of D and SafeD.
> 
> I think allowing calling @trusted or @safe functions with addresses to 
> locals is no good for @safe functions (i.e. a @safe function calls a 
> @trusted function with an address to a local without heap-allocating).  
> Remember the "returning a parameter array" problem...

I've been thinking more of examples where you pass a pointer to a 
@trusted or @safe function and that function escapes the pointer. I 
couldn't find an example. So maybe allowing that is a good solution.

How would returning a parameter array break things?

>> * Figure out a way to reconcile "ref" with variadics. This is the 
>> actual reason why getopt chose to traffic in addresses, and fixing it 
>> is the logical choice and my personal favorite.
> 
> This sounds like the best choice.

Well it's not that simple. As I explained in a different post, getopt 
takes (string, pointer, string, pointer, string, pointer, ...). Now we 
need to make it take references instead of pointers, but the strings 
should stay values. We can't express a checkered constraint like that.

Incidentally there's a theory for allowing that, it's called "regular 
types" inspired from regular grammars. With a regular type you can 
define getopt signature as one or more pairs of string and ref. 
(Unfortunately C++ defined regular types differently which makes things 
difficult to search.) Anyhow, I don't think such an approach would help 
D - it's too complicated.

>> 6. There must be many things I forgot to mention, or that cause grief 
>> to many of us. Please add to/comment on this list.
> 
> I know it's not part of the spec, but I'm not sure if you mention the 
> array "data stomping" problem in the book.  If not, the MRU cache needs 
> to be implemented.

Yes, it will be because the book has a few failing unittests. In fact, I 
was hoping I could talk you or David into doing it :o).

Andrei