Decision on container design

Tue Feb 1 09:07:55 PST 2011

On 2/1/11 10:44 AM, Michel Fortin wrote:
> On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> On 1/28/11 8:12 PM, Michel Fortin wrote:
>>> On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden at gmail.com> said:
>>>
>>>> Unfortunately, this design has big issues:
>>>>
>>>>
>>>> void fill(Appender appender)
>>>> {
>>>> appender.put("hello");
>>>> appender.put("world");
>>>> }
>>>>
>>>> void test()
>>>> {
>>>> Appender<string> appender;
>>>> fill(appender); // Appender is supposed to have reference semantics
>>>> assert(appender.length != 0); // fails!
>>>> }
>>>>
>>>> Asserting above fails because at the time you pass appender object to
>>>> the fill method it isn't initialized yet (lazy initialization). As
>>>> such, a null is passed, creating an instance at first appending, but
>>>> the result isn't seen to the caller.
>>>
>>> That's indeed a problem. I don't think it's a fatal flaw however, given
>>> that the idiom already exists in AAs.
>>>
>>> That said, the nice thing about my proposal is that you can easily reuse
>>> the Impl to create a new container to build a new container wrapper with
>>> the semantics you like with no loss of efficiency.
>>>
>>> As for the case of Appender... personally in the case above I'd be
>>> tempted to use Appender.Impl directly (value semantics) and make fill
>>> take a 'ref'. There's no point in having an extra heap allocation,
>>> especially if you're calling test() in a loop or if there's a good
>>> chance fill() has nothing to append to it.
>>>
>>> That's the issue with containers. The optimal semantics always change
>>> depending on the use case.
>>
>> Yep, yep, I found myself wrestling with the same issues. All good
>> points. On one hand containers are a target for optimization because
>> many will use them. On the other hand you'd want to have reasonably
>> simple and idiomatic code in the container implementation because you
>> want people to understand them easily and also to write their own. I
>> thought for a while of a layered approach in which you'd have both the
>> value and the sealed reference version of a container... it's just too
>> much aggravation.
>
> But are you not just pushing the aggravation elsewhere? If I need a by
> value container for some reason (performance or semantics) I'll have to
> write my own, and likely others will write their own too.

If semantics are the primary concern, you could (and in fact Phobos 
could) provide a Value!C template that automatically calls dup in 
this(this) etc.

For performance I agree there is stuff that class containers leave on 
the table.

> Using classes for containers is just marginally better than making them
> by-value structs: you can use 'new' with a by-value struct if you want
> it to behave as a class-like by-reference container:
>
> struct Container {
> ...
> }
>
> auto c = new Container();
>
> The only noticeable difference from a class container is that now c is
> now a Container*.

Well one problem now is that if you have a Container* you don't know 
whether it's dynamically allocated or the address of some 
stack-allocated object. This is pretty big; a major issue that I believe 
C++ has is that you can seldom reason modularly about functions because 
C++ makes it impossible to represent reference semantics with 
local/remote/shared/no ownership without resorting to convention.

A better solution is to define something like

auto c = new Classify!Container;

which transforms a value into a class object.

With this, the question becomes a matter of choosing the right default: 
do we want values most of the time and occasional references, or vice 
versa? I think most of the time you need references, as witnessed by the 
many '&'s out there in code working on STL containers.

>>> Personally, I'm really concerned by the case where you have a container
>>> of containers. Class semantics make things really complicated as you
>>> always have to initialize everything in the container explicitly; value
>>> semantics makes things semantically easier but quite inefficient as
>>> moving elements inside of the outermost container implies copying the
>>> containers. Making containers auto-initialize themselves on first use
>>> solves the case where containers are references-types; making containers
>>> capable of using move semantics solves the problem for value-type
>>> containers.
>>
>> Neither values nor references are perfect indeed. For example, someone
>> mentioned, hey, in STL I write set< vector<double> > and it Just
>> Works(tm). On the other hand, if you swap the two names it still seems
>> to work but it's awfully inefficient (something that may trip even
>> experienced developers).
>
> Isn't that solved by C++0x, using move semantics in swap?

This particular incarnation yes, but that doesn't automatically fix user 
code that forgets the cost of copying. But that took a large language 
change. My point was that values by default is not automatically a good 
choice.

Andrei