Ruling out arbitrary cost copy construction?

Thu Nov 4 11:38:59 PDT 2010

On 10/6/10 11:34 AM, Andrei Alexandrescu wrote:
> I think ranges are a great thing, having simplicity as one of their
> advantages. In the beginning they were indeed rather simple:
>
> struct MyRange {
> bool empty();
> ref Widget front();
> void popFront();
> }
>
> with the appropriate refinements for bidirectional and random.
>
> Then there was the need to distinguish one-pass, "input" ranges (e.g.
> files) from many-pass, "forward" ranges. So the "save" function got born
> for forward ranges and above:
>
> struct MyRange {
> bool empty();
> ref Widget front();
> void popFront();
> MyRange save();
> }
>
> Then @property came about which required extra adornments:
>
> struct MyRange {
> @property bool empty();
> @property ref Widget front();
> void popFront();
> @property MyRange save();
> }
>
> Then some ranges were unable to return ref, but they could receive
> assignment. I call these /sealed ranges/ because they "seal" the address
> of their elements making it inaccessible:
>
> struct MyRange {
> @property bool empty();
> @property Widget front();
> @property void front(Widget);
> void popFront();
> @property MyRange save();
> }
>
> This made bidirectional and random-access range interfaces quite big
> because now we're talking about back() (two versions), opIndex(),
> opIndexAssign() and opIndexOpAssign().
>
> Until now I think we're solid on design decisions. The discussion starts
> here.
>
> And then there was this nagging thing which is well-understood in the
> C++ world. A sealed range returns by value and accepts by value. But in
> C++, an object's copy constructor is arbitrarily expensive. The library
> must be designed in accordance to that and carefully navigate around
> that issue.
>
> For example, swap is a fundamental primitive used in many algorithms. It
> should swap objects at constant cost. Indeed, for references, swap is
> easy to implement:
>
> void swap(T)(ref T lhs, ref T rhs)
> {
> assert(!pointsTo(lhs, rhs) && !pointsTo(rhs, lhs)
> && !pointsTo(lhs, lhs) && !pointsTo(rhs, rhs));
> T tmp = move(lhs);
> lhs = move(rhs);
> rhs = move(tmp);
> }
>
> or similar. However, a sealed range does not offer references, so trying
> e.g.
>
> swap(r1.front, r2.front);
>
> will not work. This is a problem.
>
> To solve that problem, I introduced moveFront(), moveBack(), and
> moveAt(size_t), all of which destructively read the front, back, or an
> indexed element respectively off the range. Then you can swap r1.front
> with r2.front at constant cost like this:
>
> T tmp = r1.moveFront();
> r1.front = r2.moveFront();
> r2.front = move(tmp);
>
> All of this works and is rock-solid, but it does load the range
> interface considerably. To a newcomer coming without the background
> above, a full-fledged range definition may look quite loaded.
>
> One simplification is to simply decree that Phobos (and D in general)
> shuns objects with eager copy. Any this(this) could be considered
> constant cost. That would have two consequences:
>
> 1. It would simplify all ranges and many algorithms because there's no
> more need for moveXxx and swapping can be done the old-fashioned way
> (with assignments in inline code):
>
> T tmp = r1.front;
> r1.front = r2.front;
> r2.front = tmp;
>
> In general many things become considerably easier if you can simply
> assume that copying objects around won't be part of your complexity
> requirements or the practical costs of your algorithm.
>
> 2. It would force certain types (such as BigInt) that allocate resources
> and have value semantics to resort to reference counting.
>
> 3. It would give more legitimacy to sealed objects that return data by
> value (as opposed to reference). I believe sealed objects are very
> important for safety.
>
> 4. It would be a definite departure from C++, where all value copies are
> considered of arbitrary cost. This would provide a convenient straw-man
> for naysayers (e.g. "Hey, D calls the copy constructor even more often
> than C++! No thanks, I'll stick with C++0x which solves it all with
> rvalue references").
>
> 5. It would make things look and feel familiar to people coming from any
> other languages than C++, who seldom need to define a costly this(this)
> at all.
>
> Please discuss.
>
>
> Andrei

Thanks to all for a very illuminating discussion. The dialog revealed 
that the topic of cheap copy construction is oddly linked with the topic 
of garbage collection: garbage collection calls destructors concurrently 
with other code, which means reference count code must be interlocked, 
which makes it not-so-cheap.

This is only a manifestation of a deeper problem: destructors can 
execute arbitrary code, and, if ran in an arbitrary thread, can cause 
all sorts of deadlocks and essentially render "shared" inoperant.

Walter and I discussed two possibilities this morning:

1. Never call class destructors (or never call them if more than one 
thread is running)

2. Each destructor should be called in the thread that created the object.

The second solution is quite attractive. It can be implemented by 
associating each thread with a worklist - a list of objects to destroy. 
The GC can run in any thread but it will not call destructors. Instead, 
it adds objects to be destroyed to the threads that created them (this 
means we need to add a field to each object with the thread ID of the 
creating thread). When client code calls new, the allocation routine 
will first inspect the worklist and call the destructors if needed.

I think this can be made to work and has good properties, although a 
fair amount of details need to be figured out. Please share any thoughts 
you may have.

Andrei