Cost of assoc array?

Wed May 14 06:44:39 PDT 2014

On Wednesday, 14 May 2014 at 13:20:40 UTC, Chris wrote:
> On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:
>> On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
>>> I have code that uses the following:
>>>
>>> string[][size_t] myArray;
>>>
>>> 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];
>>>
>>> However, I've found out that I never need an assoc array and 
>>> a "linear" array would be just fine, as in
>>>
>>> 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];
>>>
>>> Is there any huge difference as regards performance and 
>>> memory footprint between the two? Or is 2. basically 1. under 
>>> the hood?
>>
>> If you don't need the features of associative arrays, don't 
>> use them.
>>
>> Normal arrays are much simpler, faster and (due to some 
>> outstanding problems with associative arrays in D) less 
>> bug-prone. Associative arrays, by definition, require a lot 
>> more work behind the scenes for both reading and writing.
>
> The question is, if they are _much_ faster. With this type 
> (string[][size_t]) I haven't encountered any bugs yet. On the 
> other hand, it introduces some rather stilted logic sometimes, 
> as in
>
> foreach (size_t i; 0..myArray.length) {
>   // do something with myArray[i];
> }
>
> because it's not sorted. This, or I sort it first. Anyway, 
> there's always an overhead associated with associative arrays. 
> I'll have to see how big this breaking change would be, and 
> decide, if it's worth it.
>
> Profiling is not really feasible, because for this to work 
> properly, I would have to introduce the change first to be able 
> to compare both. Nothing worse than carefully changing things 
> only to find out, it doesn't really speed up things.

Yes, they are much faster. Normal array indexing is equivalent to 
*(myArray.ptr + index) plus an optional bounds check, whereas 
associative array indexing is a much, much larger job.

Why were you using associative arrays in the first place? Unless 
your keys are somehow sparse* or of a non-integer type there 
isn't any reason to.

* How I see that constraint in that context:

(maxKey - minKey) / nElements > 1 + epsilon
where epsilon is the maximum proportional wasted space you could 
afford in a normal array (emptyElements / usedElements). Bear in 
mind the memory overhead of associative arrays is itself non-zero.

Also, while normal arrays tend to be more cache friendly than 
associative arrays, this isn't true for very sparse arrays.