Ruling out arbitrary cost copy construction?

Fri Oct 29 11:37:59 PDT 2010

On Friday, October 29, 2010 10:51:17 Andrei Alexandrescu wrote:
> On 10/29/10 12:21 CDT, Jonathan M Davis wrote:
> > Honestly, what I expect to happen if constant-copy cost is expected is
> > that code won't be written that way and the code that expects a
> > constant-copy cost is going to be slow. Really, I'd say that in the
> > general case, either arbitrary copy construction is going to be
> > expected, and there will be code which assumes constant-copy cost and so
> > is slow, or constant-copy cost construction is going to be expected and
> > any postplits which don't have a constant-copy cost will are going to be
> > inefficient in various places.
> > 
> > I really don't think that you have any hope of enforcing either scheme.
> 
> That is correct. I don't want to enforce as much as choosing one stance
> and sticking with it throughout the standard library. The STL is
> consistently making the following stated assumptions:
> 
> 1. Functors and iterators are cheap to copy
> 2. General objects (including containers) are arbitrarily expensive to copy
> 
> Once STL chose a way, everyone using it could figure out a way to use it
> gainfully, or to debug performance problems if there were any.
> 
> > Programmers just won't think about it in the general case. So, unless the
> > "mandatory refcounting for value type" (and I have no clue how you'd do
> > that) is actually enforced, you can't enforce either, and you _will_
> > have inefficiencies. It's just a question of where and what kind.
> 
> Walter and I have been seriously discussing introducing @refcounted
> which would automate part of the process. The problem is you can't
> automate all or even most of it. Either you must assume client code uses
> const consistently, or that client code inserts manual calls whenever
> they plan to change the object.
> 
> > If anything, I'm inclined to say that we assume that the postblit is O(1)
> > and let the programmer worry about any inefficiencies. We can point out
> > that anything worse that O(1) will be a performance problem, but it
> > seems to me that any attempt to either accomodate arbitrary cost
> > postblit constructors or to try and use any kind of scheme which forces
> > programmers to write postblits in a certain way is too complicated and
> > doomed to failure. And even if it works, it will be highly annoying to
> > deal with.
> 
> It sure is annoying, but it does work.
> 
> Don, can you estimate how difficult it would be to convert BigInt to a
> refcounted implementation?

Okay, then essentially what you're suggesting is that Phobos either

1. Attempts to avoid copying and has constructs such as moveFront() to avoid it.

2. Or makes all of its structs which have arbitrary-cost postblits into 
reference types which are refcounted and leave user code to fend for itself.

If that's the case, then neither really strikes me as much of a departure from 
C++. In the first case, we are - for better or worse - facilitating code which 
has expensive postblit constructors. In the second, we're letting them fend for 
themselves (like C++ does), but are within Phobos using ref-counting for structs 
with expensive postblits and potentially suggesting that that's what D 
programmers in general should use when they have structs which are expensive to 
copy.

Unless I'm misunderstanding something here, neither solution is really a 
departure from C++. It's just that in one case we make what a C++ programmer 
would likely have done more efficient whereas in the other we're suggesting an 
idiom which a C++ programmer would likely not have used, but if the C++ 
programmer programs using an arbitrary-cost postblit like he would have done 
with an arbitrary-cost copy constructor in C++, the situation is essentially the 
same as in C++ - except perhaps if it's a expensive to copy a range, which 
really shouldn't be the case usually.

And if both situations aren't all that much of a departure from C++, then it's 
really a question of which is best for D. The first solution has the advantage 
that user code doesn't have to care, but it complicates Phobos and the code of 
anyone looking to write a container type, a range-type, or a range-based 
algorithm (the worst of those likely being the algorithm since that's the one of 
the 3 that people are most likely to be doing). So, it hides the issues but 
imperfectly. In the second solution, anyone writing structs which are expensive 
to copy needs to care (like they probably should anyway), but Phobos is 
simplified.

In the general case, I'd say that complicating Phobos in favor of simplifying 
user code would be the correct decision, since it affects the fewest programmers, 
but ranges are far-reaching enough that it's probably better to go with the 
second solution (or just outright ignore the issue like I suggested, though if 
there are any structs in Phobos with arbitrary-cost postblit constructors where 
this could be an issue, we probably should deal with them - be it by ref-
counting or whatever makes the most sense).

- Jonathan M Davis