Array, AA Implementations

Mon Oct 19 11:20:45 PDT 2009

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
> dsimcha wrote:
> > Since there has been a lot of discussion here lately about how AAs and
> > ArrayBuilders should be implemented, we should set up a website where people
> > can contribute different candidate implementations and comment on them.  It's
> > much easier to know whether something is a good idea when you have a working
> > implementation of it than when you're discussing it in the abstract.  Also,
> > the act of actually implementing something forces you to think of details that
> > get missed when talking about it in the abstract.
> >
> > Especially for AAs, I think the best thing to do would be for everyone who has
> > a good idea to post their code in some central place for comment.  If
> > containers are to be added to Phobos, we could even post AA types that are
> > meant to handle some special case and aren't intended to be general
> > implementations.
> >
> > I already have three AA implementations that I've been prototyping and would
> > like to have reviewed, and I'm sure others have more:
> >
> > 1.  An open-addressed linear chaining AA that's designed for pure speed in the
> > common cases.  Somewhat similar to the current implementation but sacrifices
> > worst-case performance to allow lazy range-based iteration with no additional
> > allocations and more space efficiency and speed in the common cases.
> > 2.  An implementation that uses linear congruential randomized probing,
> > designed with conservative GC and space efficiency in mind.
> > 3.  An implementation I call StaticAA, which does not allow the addition or
> > removal of keys after it is constructed, but in exchange has almost zero space
> > overhead and is very GC-efficient.  It works by maintaining sorted parallel
> > arrays and using binary search.
> That's a great idea. I suggest you to find some comprehensive benchmarks
> against which the performance of AAs can be evaluated.
> Andrei

I'll work on it.  I don't know much about how/where to host a such a website (my
web programming skills are lacking), so I'll ask for volunteers to set this up.
As far as the benchmark, so far the obvious ones seem to be speed and space
efficiency under the following combinations of attributes:

key size/hashing cost:  small vs. large

size of AA:  Small (<100 elements), medium (100-10,000), large(10,000-1,000,000),
enormous(1,000,000+)

hashing performance:  well-behaved/near uniform vs. poorly behaved/worst-case

usage:  insert/remove-heavy vs. read-heavy

If anyone can think of any more, please let me know.  Also, just thought of this
now:  I wonder if it would make sense to use some polymorphism tricks (AA
operations are slow enough that an extra pointer dereference or virtual function
call isn't going to make or break their performance) to allow implementations to
be automatically and dynamically changed depending on some of the attributes
listed above.

For example, maybe binary trees are fastest for small N.  (Hypothetical, haven't
tested it.)  Obviously for sufficiently large N hash tables will be fastest.  We
could create an abstract AA type that automatically switches from binary trees to
hash tables when it makes sense, make this the default builtin AA, and expose both
of the underlying implementations in std.containers for when the user wants more
control over the details rather than DWIM.