Suggestion: dynamic array operations

Thu Sep 7 05:34:50 PDT 2006

On Wed, 06 Sep 2006 15:43:42 -0700, Sean Kelly <sean at f4.ca> wrote:

>Gregor Richards wrote:
>> Erm, totally misread everything.  Wowza.

No problem.

To be honest, I regretted posting. I have a habit of being too abrupt
and causing offense. Plus I have a lot of time on my hands ATM, and
getting bored and writing reams on silly stuff can easily end up
looking like criticism. Sorry if thats what happened.

>Not necessarily.  Appending to an array that doubles its size on 
>reallocations is an amortized O(1) operation.

That reminds me of an article in Dr. Dobbs, back when I subscribed.
Got a few good ideas from that. It might be time to get their latest
CD again, if I can afford it ;-)

Array size doubling can sound like a terrible waste of memory, or at
least it did to me when I first read about it. But then again,
efficiency with data structures is all about trade-offs.

I have a thing about multiway trees (B trees, B+ trees, etc). In
memory, not just on disk. Compared with binary trees, they have better
locality and cause less heap fragmentation for the same reasons that
arrays do, though not to the same extreme. But if you like multiway
trees, it is absurd to complain about the memory waste when doing
array size doubling. Both guarantee that the memory allocated is at
least 50% used, barring very very small datasets, but multiway trees
have per-node overheads. And (for B+-style trees) branch nodes that
don't hold data items at all.

Data structures are kind of a horses for courses thing. These days,
you can get away with a lot while using whatevers to hand. Bung a few
thousand items in any container and you won't have much to worry
about. Custom allocators can fix a lot of problems even when they do
show up. Even so, those choices always have costs as well as benefits,
as the dataset gets bigger, and those costs can sometimes be a big
problem.

I was surprised to see hash tables built into D. I had that one in my
'scripting language' stereotype. But it's probably a good thing.

But I still think there should be alternative containers in the
library (assuming they're not there anyway and I've just missed them).
I know the term 'perfect hash', but the term is precisely that - not
'perfect container'! There are things that hashtables can't do (or at
least, can't do very efficiently for large datasets) like in-order
iterating or picking out an ordered subrange.

-- 
Remove 'wants' and 'nospam' from e-mail.