Container hierarchy vs. container types

Thu Mar 4 15:22:55 PST 2010

Hi everyone,

I'm on a trip to Romania visiting relatives. Being disconnected for a 
while, I thought some about container types and started writing some code.

It didn't take long to figure that a classic class hierarchy with 
inheritance etc. would contain prohibitively many types (with awkward 
names too), or would lose a lot of expressiveness. There are just so 
many types of containers, and they have so different characteristics.

My second shot was with a capability-based architecture suggested (in an 
unrelated context) by Scott Meyers 
(http://www.artima.com/cppsource/codefeatures.html - good article, 
recommended regardless). Some capabilities I came up with were:

enum ContCaps : uint
{
     hasLength = 1,
         hasAssignableLength = 2,
         hasPushBack = 4,
         hasRemoveBack = 8,
         hasPushFront = 16,
         hasRemoveFront = 32,
         linear = 64,
         hasFront = 128,
         hasBack = 256,
         randomAccess = 512,
}

A container would look like this:

interface Container(T, uint caps = 0) ...

The trick is that any container could contain more or less any free 
combination of capabilities. There may be a few combinations that don't 
quite make sense, but most do.

The capability-based design prescribes that a container with N 
capabilities inherits each and every container with N-1 capabilities 
obtained by chopping away one capability in turn. For example, a 
container with Container!(int, 19) would inherit Container!(int, 18), 
Container!(int, 17), and Container!(int, 3). That was a bit difficult to 
implement, but I pulled it off with a string mixin.

The problem is that the number of effective base interfaces grows 
exponentially with the number of capabilities. (That's a problem also 
experienced by Scott.) If I instantiated a Container with all 
capabilities, the compiler would hang forever. If I had 6-7 
capabilities, the code would compile after a while but would generate 
interfaces that have size 140KB-260KB excluding payload, which is 
prohibitive.

The joys of inheritance hierarchies!

The design doesn't even address an important aspect that I'm still 
mulling over - how do I model whether the container exposes lvalues or 
is completely encapsulated? For example a trie is unable to expose the 
lvalues of the arrays it holds because it distributes storage.

But wait, there's more.

Containers must expose ranges which face similar issues. Here are my 
incomplete range capabilities:

enum RangeCaps
{
     rvalueElem = 1,
         assignableElem = 2,
         swappableElem = 4,
         infinite = 8,
         noLength = 16,
}

These capabilities are orthogonal on the fundamental range kinds (input, 
forward, double-ended, random). So we're looking at not one thick 
hierarchy, but two!

I decided to let no assumption unchallenged and got back to the 
question: do we really need a container _hierarchy_? How much joy and 
usefulness do people derive from dynamically changing an array with a 
list under the same interface? Or a rb-tree vs. a binary tree vs. a 
left-leaning binary tree?

As far as I can tell such changes of containers are a design-time 
decision, and extremely rarely (never as far as I remember) a runtime 
decision to hide behind a binary interface. I'm starting to think that 
it's quite the other way around, and a Sapir-Whorf kind of thing: The 
fact that Java lacks symbol aliasing capability conditions the thinking 
that a hierarchy is the right thing to do, whereas the true solution is 
to design a flat federation of container types with name- and 
semantics-compatible primitives, to then use them directly or via 
design-time aliases.

Obviously I might be wrong in any number of ways. Switching containers 
dynamically might be very useful. The precision, flexibility, and beauty 
I'm aiming for may be useless. Container hierarchies may have some other 
awesomeness I'm failing to see. But for the time being I'm very 
seriously considering a simple design:

* Each specific container is an unconnected class (or struct, haven't 
decided yet - probably structs to accommodate reference-counted containers);

* Naming matches dictate semantic matches. There should be no false 
friends - if a container defines removeFront, that should mean the same 
thing as for other containers (i.e. O(1) removal of the first element).

This design follows and enriches STL's design even after we have, as a 
community, seen many hierarchy-based designs, and even though it looks 
like hierarchies have "won". The main improvements I want to bring to 
STL's design are (a) a better categorization of containers, (b) of 
course replace iterators with ranges, (c) either eliminate allocators or 
make them work.

Needless to say, I'd be very curious to hear other opinions in this 
matter. Please speak up - the future of D depends, in small part, on 
this too.

Andrei