Robustness of the built in associative array

Mon Mar 27 02:42:14 PST 2006

In article <e06hpc$vb6$1 at digitaldaemon.com>, Dave says...
>
>Just curious - what portion of the perf. diff. was that?

I don't remember exactly. I did a btree implementation, but when I saw that it
was slightly slower than the (simpler) list implementation (and definitely not
faster), I just scrapped it.

Note that from a theoretical view point, the btrees only make sense when the
average bucket size is larger than 2. This practically only happens when the
lookup table is too small compared to the number of elements, or when you use a
bad hash function. Both probably will occur "in the wild", though.

The DMD AA should keep the btrees. I knew my data beforehand, so I didn't need
them.

>What data types?

The speed tests were done with char[] keys and int values.

> And in your estimation what accounts for the rest of the diff.

For lookups, hard to say. I found that using a simple power-of-two table size
and a bitmask for lookups in was just as good as messing about with prime
numbers, but only gave a miniscule performance increase. Perhaps templated code
just means less overhead. In any case, the difference was very small, and the
builtin AA is very fast.

Inserts, however, are a different story. The builting AA starts with a small
table size (97), and resizes (rehashes) at regular intervals as you insert
elements. Being able to specify the approximate number of elements _in advance_
(thus avoiding the rehashes) gave me a 3x-5x performance increase when
inserting.

>(e.g. are you pre-allocating a bunch of space for the data (as opposed to the
> key) or not using the GC)?

Nope, the default allocater is the GC, and one new node struct is new'ed for
each insert. (But with a malloc or region allocator it becomes much faster.)

Nick