Decision on container design

Tue Feb 1 11:26:26 PST 2011

On 01.02.2011 18:08, Steven Schveighoffer wrote:
> On Tue, 01 Feb 2011 11:44:36 -0500, Michel Fortin
> <michel.fortin at michelf.com> wrote:
>
>> On 2011-02-01 11:12:13 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> said:
>>
>>> On 1/28/11 8:12 PM, Michel Fortin wrote:
>>>> On 2011-01-28 20:10:06 -0500, "Denis Koroskin" <2korden at gmail.com>
>>>> said:
>>>>
>>>>> Unfortunately, this design has big issues:
>>>>> void fill(Appender appender)
>>>>> {
>>>>> appender.put("hello");
>>>>> appender.put("world");
>>>>> }
>>>>> void test()
>>>>> {
>>>>> Appender<string> appender;
>>>>> fill(appender); // Appender is supposed to have reference semantics
>>>>> assert(appender.length != 0); // fails!
>>>>> }
>>>>> Asserting above fails because at the time you pass appender
>>>>> object to
>>>>> the fill method it isn't initialized yet (lazy initialization). As
>>>>> such, a null is passed, creating an instance at first appending, but
>>>>> the result isn't seen to the caller.
>>>> That's indeed a problem. I don't think it's a fatal flaw however,
>>>> given
>>>> that the idiom already exists in AAs.
>>>> That said, the nice thing about my proposal is that you can easily
>>>> reuse
>>>> the Impl to create a new container to build a new container
>>>> wrapper with
>>>> the semantics you like with no loss of efficiency.
>>>> As for the case of Appender... personally in the case above I'd be
>>>> tempted to use Appender.Impl directly (value semantics) and make fill
>>>> take a 'ref'. There's no point in having an extra heap allocation,
>>>> especially if you're calling test() in a loop or if there's a good
>>>> chance fill() has nothing to append to it.
>
> I've been thinking of making Appender.Impl public or at least its own
> type. A stack-based appender makes a lot of sense when you are using
> it temporarily to build an array.
>
> But array-based containers really are in a separate class from
> node-based containers. It's tempting to conflate the two because they
> are both 'containers', but arrays allow many more
> optimizations/features that node-based containers simply can't do.
>
>>> Yep, yep, I found myself wrestling with the same issues. All good
>>> points. On one hand containers are a target for optimization
>>> because many will use them. On the other hand you'd want to have
>>> reasonably simple and idiomatic code in the container
>>> implementation because you want people to understand them easily
>>> and also to write their own. I thought for a while of a layered
>>> approach in which you'd have both the value and the sealed
>>> reference version of a container... it's just too much aggravation.
>>
>> But are you not just pushing the aggravation elsewhere? If I need a
>> by value container for some reason (performance or semantics) I'll
>> have to write my own, and likely others will write their own too.
>
> foo(container.dup) ; // value semantics
>
> I'm sure some template guru can make a wrapper type for this for the
> rare occasions that you need it. We can work on solving the
> "auto-initialization" issue (where a nested container 'just works'), I
> think there are ways to do it.
>
> If that still doesn't help for your issues, then writing your own may
> be the only valid option.
>
>>
>> Using classes for containers is just marginally better than making
>> them by-value structs: you can use 'new' with a by-value struct if
>> you want it to behave as a class-like by-reference container:
>>
>> struct Container {
>> ...
>> }
>>
>> auto c = new Container();
>>
>> The only noticeable difference from a class container is that now c
>> is now a Container*.
>
> And doesn't get cleaned up by the GC properly. Plus, each member call
> must check if the container is 'instantiated', since we can have no
> default ctors.
>
> Yes, it's a trade-off, and I think by far class-based containers win
> for the common case.
>
>>>> Personally, I'm really concerned by the case where you have a
>>>> container
>>>> of containers. Class semantics make things really complicated as you
>>>> always have to initialize everything in the container explicitly;
>>>> value
>>>> semantics makes things semantically easier but quite inefficient as
>>>> moving elements inside of the outermost container implies copying the
>>>> containers. Making containers auto-initialize themselves on first use
>>>> solves the case where containers are references-types; making
>>>> containers
>>>> capable of using move semantics solves the problem for value-type
>>>> containers.
>>> Neither values nor references are perfect indeed. For example,
>>> someone mentioned, hey, in STL I write set< vector<double> > and it
>>> Just Works(tm). On the other hand, if you swap the two names it
>>> still seems to work but it's awfully inefficient (something that
>>> may trip even experienced developers).
>>
>> Isn't that solved by C++0x, using move semantics in swap?
>>
>
> swap isn't the problem.
>
> foreach(s; myVectorSet)
> {
> // if s is by value, it must be copied for each iteration in the loop
> }
 >
 > -Steve

Just to note: the "correct" solution for the last problem is

foreach(const ref s; myVectorSet)

which is working in current D. In a more value-based language you may 
even want to default to "const ref" for foreach-loop-values, and even 
for function-parameters. I suggested that a while ago, but wasn't 
liked much for D, for good reasons.

- Krox