Container hierarchy vs. container types
Steven Schveighoffer
schveiguy at yahoo.com
Fri Mar 5 04:58:48 PST 2010
On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
>
> Needless to say, I'd be very curious to hear other opinions in this
> matter. Please speak up - the future of D depends, in small part, on
> this too.
As you know, you and I disagree on some aspects of containers. However, I
think our philosophies are converging.
Having built a container library, I can tell you the one and only reason I
made it have an interface hierarchy -- Tango. Tango had a container
library, which was based on ancient Doug Lea collections, with some added
features. Because I intended dcollections to replace Tango's collections,
I tried to encompass the same feature set that it had. One of those
features was the ability to use interfaces without knowing the
implementation. Since then, Tango has replaced their container collection
with something different, and guess what -- no more interface hierarchy.
They have a single interface ICollection, and all containers implement
that interface. No Map interface, or List interface or anything.
However, although I think generic containers can avoid *requiring* an
interface, if you design your classes well, slapping on an interface costs
almost nothing. It depends on whether you wish to use classes or structs
to implement containers. I still think classes are better, because
modifying one aspect of a class is easy to do. If someone wants to make a
new type of HashMap that does everything my HashMap does, except changes
one little bit, it's really easy. With the advent of alias this, it's
also possible to do something similar with structs, but not as
straightforward, and not without recompiling. Also, passing containers
around by value by default is one of the aspects of STL that I think
sucks. When working with STL, I almost never passed around a container by
value, I always used a reference, because passing by value can incur large
hidden-allocation costs.
I'll go over a quick set of points that are pro-interface. First, using
an interface hides the implementation. It may not be possible to have
your code on display for the compiler to use. Using an interface is a
perfectly acceptable way to hide proprietary code that you cannot legally
divulge. This is probably the weakest of the points, but I put it out
there. Second, D is a statically compiled language, but with the
(hopefully soon) evolution to dynamic linking, using an interface is
ideal. If you for instance wish to pass a map to or from a plugin
library, using an interface is probably the best way to do it. Interfaces
are less susceptible to implementation changes/differences. Third, code
that uses an interface is compiled once per interface. Code that uses
duck typing is compiled once per set of arguments. While this might not
seem like much, it can reduce the footprint of generated code. Using duck
typing, you may have two almost identical generated functions that differ
only by the function addresses used. Finally, interfaces simplify
understanding. Once you have used an interface, you know "oh yeah, this
is a map, so I can use it like a map." You can strive to build a
container library that follows those principles, even making assertions
that force the compiler to prove those principles, but it's not as easy
for a person to understand as it is to look at interface documentation and
know what it does. This becomes important when using libraries that use
special implementations of containers. Like for instance a database
result or an XML tree.
Interfaces in other languages can be viewed as advantageous in other ways,
but D has advanced compile-time interfaces so far that those don't really
matter in D. For example, declaring that a function requires a map
container can be done with duck typing via conditional compilation.
At the same time, just like I think ranges don't fit every model (*cough*
I/O), interfaces aren't the answer to every aspect of containers. I don't
think ranges fit well with interfaces, because iterating interface ranges
prevents inlining -- the major draw of ranges in the first place -- and
ranges are so much more useful with value semantics. I also think
functions that can be tuned to each implementation should be. For this
reason, dcollections containers provide a lot of functionality that is not
included in the interfaces, simply because the functions are so specific
to the implementation, it would be the only class that implemented that
interface. For example, all the functions that return cursors (and soon
ranges) are not interface functions. This doesn't make them useless via
interfaces, but you cannot use every aspect of the container via an
interface. An interface is like a common denominator, and I think it
should be useful for some purposes. If nobody will ever use the interface
as a parameter to a function, then there is no point in declaring the
interface (I realize that I have created such interfaces in dcollections
and I plan to correct that -- one nice benefit of the contemplation
triggered by this discussion).
I am working on updating dcollections as we post, and I think I have come
up with a nifty way to do ranges that will both retain the usefulness of
the cursor, and provide a common way to plug the collections into
std.algorithm.
Good luck with your containers, I still hold out hope that dcollections
can be integrated in Phobos, but I probably need to get it working in
order to have any chance of competing :)
-Steve
More information about the Digitalmars-d
mailing list