Ruling out arbitrary cost copy construction?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Wed Oct 6 09:34:54 PDT 2010


I think ranges are a great thing, having simplicity as one of their 
advantages. In the beginning they were indeed rather simple:

struct MyRange {
     bool empty();
     ref Widget front();
     void popFront();
}

with the appropriate refinements for bidirectional and random.

Then there was the need to distinguish one-pass, "input" ranges (e.g. 
files) from many-pass, "forward" ranges. So the "save" function got born 
for forward ranges and above:

struct MyRange {
     bool empty();
     ref Widget front();
     void popFront();
     MyRange save();
}

Then @property came about which required extra adornments:

struct MyRange {
     @property bool empty();
     @property ref Widget front();
     void popFront();
     @property MyRange save();
}

Then some ranges were unable to return ref, but they could receive 
assignment. I call these /sealed ranges/ because they "seal" the address 
of their elements making it inaccessible:

struct MyRange {
     @property bool empty();
     @property Widget front();
     @property void front(Widget);
     void popFront();
     @property MyRange save();
}

This made bidirectional and random-access range interfaces quite big 
because now we're talking about back() (two versions), opIndex(), 
opIndexAssign() and opIndexOpAssign().

Until now I think we're solid on design decisions. The discussion starts 
here.

And then there was this nagging thing which is well-understood in the 
C++ world. A sealed range returns by value and accepts by value. But in 
C++, an object's copy constructor is arbitrarily expensive. The library 
must be designed in accordance to that and carefully navigate around 
that issue.

For example, swap is a fundamental primitive used in many algorithms. It 
should swap objects at constant cost. Indeed, for references, swap is 
easy to implement:

void swap(T)(ref T lhs, ref T rhs)
{
     assert(!pointsTo(lhs, rhs) && !pointsTo(rhs, lhs)
         && !pointsTo(lhs, lhs) && !pointsTo(rhs, rhs));
     T tmp = move(lhs);
     lhs = move(rhs);
     rhs = move(tmp);
}

or similar. However, a sealed range does not offer references, so trying 
e.g.

swap(r1.front, r2.front);

will not work. This is a problem.

To solve that problem, I introduced moveFront(), moveBack(), and 
moveAt(size_t), all of which destructively read the front, back, or an 
indexed element respectively off the range. Then you can swap r1.front 
with r2.front at constant cost like this:

     T tmp = r1.moveFront();
     r1.front = r2.moveFront();
     r2.front = move(tmp);

All of this works and is rock-solid, but it does load the range 
interface considerably. To a newcomer coming without the background 
above, a full-fledged range definition may look quite loaded.

One simplification is to simply decree that Phobos (and D in general) 
shuns objects with eager copy. Any this(this) could be considered 
constant cost. That would have two consequences:

1. It would simplify all ranges and many algorithms because there's no 
more need for moveXxx and swapping can be done the old-fashioned way 
(with assignments in inline code):

     T tmp = r1.front;
     r1.front = r2.front;
     r2.front = tmp;

In general many things become considerably easier if you can simply 
assume that copying objects around won't be part of your complexity 
requirements or the practical costs of your algorithm.

2. It would force certain types (such as BigInt) that allocate resources 
and have value semantics to resort to reference counting.

3. It would give more legitimacy to sealed objects that return data by 
value (as opposed to reference). I believe sealed objects are very 
important for safety.

4. It would be a definite departure from C++, where all value copies are 
considered of arbitrary cost. This would provide a convenient straw-man 
for naysayers (e.g. "Hey, D calls the copy constructor even more often 
than C++! No thanks, I'll stick with C++0x which solves it all with 
rvalue references").

5. It would make things look and feel familiar to people coming from any 
other languages than C++, who seldom need to define a costly this(this) 
at all.

Please discuss.


Andrei


More information about the Digitalmars-d mailing list