std.container and classes

Sat Dec 17 17:52:40 PST 2011

On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
> On 12/13/11 9:08 PM, Jonathan M Davis wrote:
> > Is the plan for std.container still to have all of its containers be
> > final classes (classes so that they're reference types and final so
> > that their functions are inlinable)? Or has that changed? I believe
> > that Andrei said something recently about discussing reference counting
> > and containers with Walter.
> > 
> > The reason that I bring this up is that Array and SList are still
> > structs, and the longer that they're structs, the more code that will
> > break when they get changed to classes. Granted, some level of code
> > breakage may occur when we add custom allocators to them, but since
> > that would probably only affect the constructor (and preferably
> > wouldn't affect anything if you want to simply create a container with
> > the GC heap as you would now were Array and SList classes), the
> > breakage for that should be minimal.
> > 
> > Is there any reason for me to not just go and make Array and SList final
> > classes and create a pull request for it?
> > 
> > - Jonathan M Davis
> 
> Apologies for being slow on this. It may be a fateful time to discuss
> that right now, after all the discussion of what's appropriate for
> stdlib vs. application code etc.
> 
> As some of you know, Walter and I went back and forth several times on
> this. First, there was the issue of making containers value types vs.
> reference types. Making containers value types would be in keep with the
> STL approach. However, Walter noted that copying entire containers by
> default is most often NOT desirable and there's significant care and
> adornments in C++ programs to make sure that that default behavior is
> avoided (e.g. adding const& to function parameters).
> 
> So we decided to make containers reference types, and that seemed to be
> a good choice.
> 
> The second decision is classes vs. structs. Walter correctly pointed out
> that the obvious choice for defining a reference type in D - whether the
> type is momonorphic or polymorphic - is making it a class. If containers
> aren't classes, the reasoning went, it means we took a wrong step
> somewhere; it might mean our flagship abstraction for reference types is
> not suitable for, well, defining a reference type.
> 
> Fast forward a couple of months, a few unslept nights, and a bunch of
> newsgroup and IRC conversations. Several additional pieces came together.
> 
> The most important thing I noticed is that people expect standard
> containers to have sophisticated memory management. Many ask not about
> containers as much as "containers with custom allocators". Second,
> containers tend to be large memory users by definition. Third,
> containers are self-contained (heh) and relatively simple in terms of
> what they model, meaning that they _never_ suffer from circular
> references, like general entity types might.
> 
> All of these arguments very strongly suggest that many want containers
> to be types with deterministic control over memory and accept
> configurable allocation strategies (regions, heap, malloc, custom). So
> that would mean containers should be reference counted structs.
> 
> This cycle of thought has happened twice, and the evidence coming the
> second time has been stronger. The first time around I went about and
> started implementing std.container with reference counting in mind. The
> code is not easy to write, and is not to be recommended for most types,
> hence my thinking (at the end of the first cycle) that we should switch
> to class containers. One fear I have is that people would be curious,
> look at the implementation of std.container, and be like "so am I
> expected to do all this to define a robust type"? I start to think that
> the right answer to that is to improve library support for good
> reference counted types, and define reference counted struct containers
> that are deterministic.
> 
> Safety is also an issue. I was hoping I'd provide safety as a policy,
> e.g. one may choose for a given container whether they want safe or not
> (and presumably fast). I think it's best to postpone that policy and
> focus for now on defining safe containers with safe ranges. This
> precludes e.g. using T[] as a range for Array!T.
> 
> Please discuss.

The only reason that I can think of to use a reference-counted struct instead 
of a class is becuse then it's easier to avoid the GC heap entirely.  Almost 
all of a container's memory is going to end up on the heap regardless, because 
the elements almost never end up in the container itself. They're in a dynamic 
array or in nodes or something similar. So, whether the container is ref-
counted or a class is almost irrelevant. If anything making it a class makes 
more sense, because then it doesn't have to worry about the extra cost of ref-
counting. However, once we bring allocators into the picture, the situation 
changes slightly. In both cases, the elements themselves end up where the 
allocator puts them. However, in the case of a struct, the member variables 
themselves end up on the stack, whereas with the class, they end up on the GC 
heap unless the programmer puts the object somewhere else with emplace or 
possibly with a function on the allocator.

The other concern is that with a class, it's immediately clear that you're 
dealing with a reference, but that's much easier to miss with a struct when 
structs are almost always value types.

Also, some containers can't be an a usable state from their init value (e.g. 
RedBlackTree), so they're more awkward as structs. @disable might fix that 
problem, but I don't know. It would be also be awkward to not be able to just 
declare a RedBlackTree without immediately initializing it.

So, in general, I think that final classes make more sense. The ref-counted 
struct is just trying to do what a class does already, and the class doesn't 
have the extra overhead of the ref-counting. The only concern is the fact that 
even when using a custom allocator, the few member variables that the 
container has end up on the GC heap unless the programmer goes to extra effort 
to avoid it, and part of the point of using the custom allocatorc is 
specifically to avoid the GC heap. The allocator API could make that easy to 
deal with though by having a function which takes the type and the arguments 
for the constructor and constructs it for you. e.g.

auto arr = custom.alloc!Array(4, 5, 7);

So, I favor final classes. I just don't see much benefit in ref-counted structs. 
They're more confusing and generally harder to deal with, with the one 
exception being avoiding the GC heap for the member variables of the 
container.

- Jonathan M Davis