Container hierarchy vs. container types
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Mar 4 15:22:55 PST 2010
Hi everyone,
I'm on a trip to Romania visiting relatives. Being disconnected for a
while, I thought some about container types and started writing some code.
It didn't take long to figure that a classic class hierarchy with
inheritance etc. would contain prohibitively many types (with awkward
names too), or would lose a lot of expressiveness. There are just so
many types of containers, and they have so different characteristics.
My second shot was with a capability-based architecture suggested (in an
unrelated context) by Scott Meyers
(http://www.artima.com/cppsource/codefeatures.html - good article,
recommended regardless). Some capabilities I came up with were:
enum ContCaps : uint
{
hasLength = 1,
hasAssignableLength = 2,
hasPushBack = 4,
hasRemoveBack = 8,
hasPushFront = 16,
hasRemoveFront = 32,
linear = 64,
hasFront = 128,
hasBack = 256,
randomAccess = 512,
}
A container would look like this:
interface Container(T, uint caps = 0) ...
The trick is that any container could contain more or less any free
combination of capabilities. There may be a few combinations that don't
quite make sense, but most do.
The capability-based design prescribes that a container with N
capabilities inherits each and every container with N-1 capabilities
obtained by chopping away one capability in turn. For example, a
container with Container!(int, 19) would inherit Container!(int, 18),
Container!(int, 17), and Container!(int, 3). That was a bit difficult to
implement, but I pulled it off with a string mixin.
The problem is that the number of effective base interfaces grows
exponentially with the number of capabilities. (That's a problem also
experienced by Scott.) If I instantiated a Container with all
capabilities, the compiler would hang forever. If I had 6-7
capabilities, the code would compile after a while but would generate
interfaces that have size 140KB-260KB excluding payload, which is
prohibitive.
The joys of inheritance hierarchies!
The design doesn't even address an important aspect that I'm still
mulling over - how do I model whether the container exposes lvalues or
is completely encapsulated? For example a trie is unable to expose the
lvalues of the arrays it holds because it distributes storage.
But wait, there's more.
Containers must expose ranges which face similar issues. Here are my
incomplete range capabilities:
enum RangeCaps
{
rvalueElem = 1,
assignableElem = 2,
swappableElem = 4,
infinite = 8,
noLength = 16,
}
These capabilities are orthogonal on the fundamental range kinds (input,
forward, double-ended, random). So we're looking at not one thick
hierarchy, but two!
I decided to let no assumption unchallenged and got back to the
question: do we really need a container _hierarchy_? How much joy and
usefulness do people derive from dynamically changing an array with a
list under the same interface? Or a rb-tree vs. a binary tree vs. a
left-leaning binary tree?
As far as I can tell such changes of containers are a design-time
decision, and extremely rarely (never as far as I remember) a runtime
decision to hide behind a binary interface. I'm starting to think that
it's quite the other way around, and a Sapir-Whorf kind of thing: The
fact that Java lacks symbol aliasing capability conditions the thinking
that a hierarchy is the right thing to do, whereas the true solution is
to design a flat federation of container types with name- and
semantics-compatible primitives, to then use them directly or via
design-time aliases.
Obviously I might be wrong in any number of ways. Switching containers
dynamically might be very useful. The precision, flexibility, and beauty
I'm aiming for may be useless. Container hierarchies may have some other
awesomeness I'm failing to see. But for the time being I'm very
seriously considering a simple design:
* Each specific container is an unconnected class (or struct, haven't
decided yet - probably structs to accommodate reference-counted containers);
* Naming matches dictate semantic matches. There should be no false
friends - if a container defines removeFront, that should mean the same
thing as for other containers (i.e. O(1) removal of the first element).
This design follows and enriches STL's design even after we have, as a
community, seen many hierarchy-based designs, and even though it looks
like hierarchies have "won". The main improvements I want to bring to
STL's design are (a) a better categorization of containers, (b) of
course replace iterators with ranges, (c) either eliminate allocators or
make them work.
Needless to say, I'd be very curious to hear other opinions in this
matter. Please speak up - the future of D depends, in small part, on
this too.
Andrei
More information about the Digitalmars-d
mailing list