Associative Array: reasonable limits?

Sat Nov 2 17:39:02 PDT 2013

On 11/02/2013 05:16 PM, TheFlyingFiddle wrote:
> On Saturday, 2 November 2013 at 23:58:07 UTC, Charles Hixson wrote:
>> On 11/02/2013 04:49 PM, TheFlyingFiddle wrote:
>>> What are you going to be using the collection for?
>>>
>>>
>> It's basically a lookup table used for translating external codes 
>> (essentially arbitrary) into id#s used internally.  There are LOTS of 
>> id#s that don't correspond to ANY external code, and nearly all the 
>> processing is done only using those id#s.  The codes are used only 
>> during i/o, as they are more intelligible to people.
>
> Well if the codes are only used during i/o the performance of the 
> collection is not rly that important. Since the bottleneck is the i/o 
> anyways.
>
> Saying that my recommendation is to go with the AA since it's rly 
> simple to work with. It also has the benefit that if you have a good 
> hashing function lookup time will generally be O(1). Also this 
> eleminates the need for building a custom datastructure.
>
> It is my understanding that there are no hard limits on the number of 
> elements in the built in AA.(But i canno't say for sure, it's very 
> simple to test though).
>
> Since the collection is never going to be deleted. If the external 
> codes are known at compiletime i would recomend you to generate the 
> collection at compiletime thus avoiding the runtime cost of building 
> the table. (This works for either AA or Sorted Array) Also this 
> eleminates the need to care about insertion times (unless you get 
> unreasonable build times ofc ^^).
>
> Hope this helped a little. Its hard to say if AA or SA is the way to 
> go without profiling the complete application.
>
>
Well, I didn't expect any hard limits, but it isn't easy to test this in 
context.  I'm pretty sure an array implementation could be run without 
impacting the rest of the program (which hasn't yet been written).  It's 
slower, but it's less resource intensive, and it's even smaller.  If AAs 
don't have any hidden penalty, like causing the garbage collector to 
thrash, then that's the way to go (and then, yes, I'll need to test it, 
but I'd really be surprised if THAT step caused a problem).  IIRC I've 
used larger AAs in simple programs without problems, but this one will 
probably run continually for months, and the AA is only one piece, so 
I'm a bit concerned about things that I can't see any easy way to test.  
So I thought I'd ask.
P.S.:  The external codes are NOT known at compile time.  Not even at 
the start of run time.  They are added incrementally, and there is no 
actual rule about what they will be.  (This is a part of the reason for 
all the "about" and "estimated" in my original statement (unless I 
edited them out).)
   And there will actually be two of the structures of slightly 
differing composition and size, but they're essentially the same in 
use.  There's also be a couple of others that are rather different in 
this particular section of the program.  Access to all of these will be 
via a handler class, so I may make all the mentioned tables private and 
handle them with internal classes.  But this is probably more about 
penguins than you really want to know.

So it sounds like I should plan on using AAs, and just keep the array 
option around as a fallback position.

-- 
Charles Hixson