std.container and classes

Tue Dec 20 07:39:55 PST 2011

Jonathan M Davis <jmdavisProg at gmx.com> writes:

> On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:
>> On 12/13/11 9:08 PM, Jonathan M Davis wrote:
>> > Is the plan for std.container still to have all of its containers be
>> > final classes (classes so that they're reference types and final so
>> > that their functions are inlinable)? Or has that changed? I believe
>> > that Andrei said something recently about discussing reference counting
>> > and containers with Walter.
>> > 
>> > The reason that I bring this up is that Array and SList are still
>> > structs, and the longer that they're structs, the more code that will
>> > break when they get changed to classes. Granted, some level of code
>> > breakage may occur when we add custom allocators to them, but since
>> > that would probably only affect the constructor (and preferably
>> > wouldn't affect anything if you want to simply create a container with
>> > the GC heap as you would now were Array and SList classes), the
>> > breakage for that should be minimal.
>> > 
>> > Is there any reason for me to not just go and make Array and SList final
>> > classes and create a pull request for it?
>> > 
>> > - Jonathan M Davis
>> 
>> Apologies for being slow on this. It may be a fateful time to discuss
>> that right now, after all the discussion of what's appropriate for
>> stdlib vs. application code etc.
>> 
>> As some of you know, Walter and I went back and forth several times on
>> this. First, there was the issue of making containers value types vs.
>> reference types. Making containers value types would be in keep with the
>> STL approach. However, Walter noted that copying entire containers by
>> default is most often NOT desirable and there's significant care and
>> adornments in C++ programs to make sure that that default behavior is
>> avoided (e.g. adding const& to function parameters).
>> 
>> So we decided to make containers reference types, and that seemed to be
>> a good choice.
>> 
>> The second decision is classes vs. structs. Walter correctly pointed out
>> that the obvious choice for defining a reference type in D - whether the
>> type is momonorphic or polymorphic - is making it a class. If containers
>> aren't classes, the reasoning went, it means we took a wrong step
>> somewhere; it might mean our flagship abstraction for reference types is
>> not suitable for, well, defining a reference type.
>> 
>> Fast forward a couple of months, a few unslept nights, and a bunch of
>> newsgroup and IRC conversations. Several additional pieces came together.
>> 
>> The most important thing I noticed is that people expect standard
>> containers to have sophisticated memory management. Many ask not about
>> containers as much as "containers with custom allocators". Second,
>> containers tend to be large memory users by definition. Third,
>> containers are self-contained (heh) and relatively simple in terms of
>> what they model, meaning that they _never_ suffer from circular
>> references, like general entity types might.
>> 
>> All of these arguments very strongly suggest that many want containers
>> to be types with deterministic control over memory and accept
>> configurable allocation strategies (regions, heap, malloc, custom). So
>> that would mean containers should be reference counted structs.
>> 
>> This cycle of thought has happened twice, and the evidence coming the
>> second time has been stronger. The first time around I went about and
>> started implementing std.container with reference counting in mind. The
>> code is not easy to write, and is not to be recommended for most types,
>> hence my thinking (at the end of the first cycle) that we should switch
>> to class containers. One fear I have is that people would be curious,
>> look at the implementation of std.container, and be like "so am I
>> expected to do all this to define a robust type"? I start to think that
>> the right answer to that is to improve library support for good
>> reference counted types, and define reference counted struct containers
>> that are deterministic.
>> 
>> Safety is also an issue. I was hoping I'd provide safety as a policy,
>> e.g. one may choose for a given container whether they want safe or not
>> (and presumably fast). I think it's best to postpone that policy and
>> focus for now on defining safe containers with safe ranges. This
>> precludes e.g. using T[] as a range for Array!T.
>> 
>> Please discuss.
>
> The only reason that I can think of to use a reference-counted struct instead 
> of a class is becuse then it's easier to avoid the GC heap entirely.  Almost 
> all of a container's memory is going to end up on the heap regardless, because 
> the elements almost never end up in the container itself. They're in a dynamic 
> array or in nodes or something similar. So, whether the container is ref-
> counted or a class is almost irrelevant. If anything making it a class makes 
> more sense, because then it doesn't have to worry about the extra cost of ref-
> counting. However, once we bring allocators into the picture, the situation 
> changes slightly. In both cases, the elements themselves end up where the 
> allocator puts them. However, in the case of a struct, the member variables 
> themselves end up on the stack, whereas with the class, they end up on the GC 
> heap unless the programmer puts the object somewhere else with emplace or 
> possibly with a function on the allocator.
>
> The other concern is that with a class, it's immediately clear that you're 
> dealing with a reference, but that's much easier to miss with a struct when 
> structs are almost always value types.
>
> Also, some containers can't be an a usable state from their init value (e.g. 
> RedBlackTree), so they're more awkward as structs. @disable might fix that 
> problem, but I don't know. It would be also be awkward to not be able to just 
> declare a RedBlackTree without immediately initializing it.
>
> So, in general, I think that final classes make more sense. The ref-counted 
> struct is just trying to do what a class does already, and the class doesn't 
> have the extra overhead of the ref-counting. The only concern is the fact that 
> even when using a custom allocator, the few member variables that the 
> container has end up on the GC heap unless the programmer goes to extra effort 
> to avoid it, and part of the point of using the custom allocatorc is 
> specifically to avoid the GC heap. The allocator API could make that easy to 
> deal with though by having a function which takes the type and the arguments 
> for the constructor and constructs it for you. e.g.
>
> auto arr = custom.alloc!Array(4, 5, 7);
>
> So, I favor final classes. I just don't see much benefit in ref-counted structs. 
> They're more confusing and generally harder to deal with, with the one 
> exception being avoiding the GC heap for the member variables of the 
> container.
>
> - Jonathan M Davis

One advantage of a struct container is that when you make it a member of a struct
or class that will have lots of instances and therefore lots of tiny
containers.  

In C++ at my work, we embed tiny vectors in objects all the time.  If
you have a situation where there's enough of the objects that the memory
actually matters, and the vectors need to be resizeable but are often
small, the container overhead matters.  With a class, you've added a
reference pointer, vtable, typeinfo on top of the storage for the
container itself.

Also, the extra dereference to get from the object to its member
container is often going to be a cache miss, slowing things down
further.  So if you embed an Array class in an object, you pay for 2
pointer dereferences to get to the data rather than one.  With a class
you'll have to roll your own to avoid the costs.

Jerry