Container hierarchy vs. container types

Fri Mar 5 04:58:48 PST 2010

On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

>
> Needless to say, I'd be very curious to hear other opinions in this  
> matter. Please speak up - the future of D depends, in small part, on  
> this too.

As you know, you and I disagree on some aspects of containers.  However, I  
think our philosophies are converging.

Having built a container library, I can tell you the one and only reason I  
made it have an interface hierarchy -- Tango.  Tango had a container  
library, which was based on ancient Doug Lea collections, with some added  
features.  Because I intended dcollections to replace Tango's collections,  
I tried to encompass the same feature set that it had.  One of those  
features was the ability to use interfaces without knowing the  
implementation.  Since then, Tango has replaced their container collection  
with something different, and guess what -- no more interface hierarchy.   
They have a single interface ICollection, and all containers implement  
that interface.  No Map interface, or List interface or anything.

However, although I think generic containers can avoid *requiring* an  
interface, if you design your classes well, slapping on an interface costs  
almost nothing.  It depends on whether you wish to use classes or structs  
to implement containers.  I still think classes are better, because  
modifying one aspect of a class is easy to do.  If someone wants to make a  
new type of HashMap that does everything my HashMap does, except changes  
one little bit, it's really easy.  With the advent of alias this, it's  
also possible to do something similar with structs, but not as  
straightforward, and not without recompiling.  Also, passing containers  
around by value by default is one of the aspects of STL that I think  
sucks.  When working with STL, I almost never passed around a container by  
value, I always used a reference, because passing by value can incur large  
hidden-allocation costs.

I'll go over a quick set of points that are pro-interface.  First, using  
an interface hides the implementation.  It may not be possible to have  
your code on display for the compiler to use.  Using an interface is a  
perfectly acceptable way to hide proprietary code that you cannot legally  
divulge.  This is probably the weakest of the points, but I put it out  
there.  Second, D is a statically compiled language, but with the  
(hopefully soon) evolution to dynamic linking, using an interface is  
ideal.  If you for instance wish to pass a map to or from a plugin  
library, using an interface is probably the best way to do it.  Interfaces  
are less susceptible to implementation changes/differences.  Third, code  
that uses an interface is compiled once per interface.  Code that uses  
duck typing is compiled once per set of arguments.  While this might not  
seem like much, it can reduce the footprint of generated code.  Using duck  
typing, you may have two almost identical generated functions that differ  
only by the function addresses used.  Finally, interfaces simplify  
understanding.  Once you have used an interface, you know "oh yeah, this  
is a map, so I can use it like a map."  You can strive to build a  
container library that follows those principles, even making assertions  
that force the compiler to prove those principles, but it's not as easy  
for a person to understand as it is to look at interface documentation and  
know what it does.  This becomes important when using libraries that use  
special implementations of containers.  Like for instance a database  
result or an XML tree.

Interfaces in other languages can be viewed as advantageous in other ways,  
but D has advanced compile-time interfaces so far that those don't really  
matter in D.  For example, declaring that a function requires a map  
container can be done with duck typing via conditional compilation.

At the same time, just like I think ranges don't fit every model (*cough*  
I/O), interfaces aren't the answer to every aspect of containers.  I don't  
think ranges fit well with interfaces, because iterating interface ranges  
prevents inlining -- the major draw of ranges in the first place -- and  
ranges are so much more useful with value semantics.  I also think  
functions that can be tuned to each implementation should be.  For this  
reason, dcollections containers provide a lot of functionality that is not  
included in the interfaces, simply because the functions are so specific  
to the implementation, it would be the only class that implemented that  
interface.  For example, all the functions that return cursors (and soon  
ranges) are not interface functions.  This doesn't make them useless via  
interfaces, but you cannot use every aspect of the container via an  
interface.  An interface is like a common denominator, and I think it  
should be useful for some purposes.  If nobody will ever use the interface  
as a parameter to a function, then there is no point in declaring the  
interface (I realize that I have created such interfaces in dcollections  
and I plan to correct that -- one nice benefit of the contemplation  
triggered by this discussion).

I am working on updating dcollections as we post, and I think I have come  
up with a nifty way to do ranges that will both retain the usefulness of  
the cursor, and provide a common way to plug the collections into  
std.algorithm.

Good luck with your containers, I still hold out hope that dcollections  
can be integrated in Phobos, but I probably need to get it working in  
order to have any chance of competing :)

-Steve