Robustness of the built in associative array

Sun Mar 26 14:20:40 PST 2006

Oskar Linde wrote:
> Hello,
> 
> The latest "Implicit fn template instantiation" thread contains code 
> that contains a hidden and possibly hard to find bug. Rewriting the code 
> with a non-templated version for clarity:
> 
> void increment(int[char[]] map, char[] key) {
>     if (auto t = (key in map))
>         (*t)++;
>     else
>         map[key] = 1;
> }
> 
> To understand the code, one first has to know if the associative array 
> is a value type or a reference type.
> 
> It turns out it is neither. It is definitely not a value type. The above 
> function will in most cases alter the original associative array and 
> work as expected. But it is not a reference type either -- in the few 
> cases where adding a new key to the AA invokes a rehash, all hell breaks 
> loose. The original associative array is suddenly *corrupted*! (The way 
> to make the function work as expected is to make the map an inout 
> argument, but that is beside my point.)
> 
> Internally, an AA is implemented as an aaA*[] -- a dynamic array of 
> pointers to aaA-nodes. The first node of the array contains metadata 
> (the total number of keys), and the remaining [1..$] nodes contains the 
> hash buckets. When passing an aa to a function, the aaA*[] gets passed. 
> I assume the reason for this implementations is efficiency. All you need 
> to do an efficient lookup is the array of buckets. But is this 
> implementation comes at a cost of robustness. To make use of AA's, you 
> have to follow a special rule:
> 
> - Never add an element to or rehash the AA unless you know you hold the 
> only reference.
> 
> This means that even though the AA behaves as a reference type, the data 
> can not be shared unless all references are readonly. I feel I must see 
> this bastardization of being neither value or reference type a design 
> mistake.
> 

Oskar, thanks for pointing that out. That was one issue that I hadn't 
noticed yet, and I agree wholeheartedly that is a quite a design problem.

As a side note: I would prefer saying that these are more like broken 
reference types, rather than saying they are not reference types, since 
that may not be the best way to describe it (not being reference or 
value types is more the nature of the problem of static arrays).
Anyway, in any case you description of the problem is quite clear.

Also, this illustrates another issue, which is the tricky reference 
semantics of dynamic arrays. Dynamic arrays are reference types, but 
they have a "catch", in that unlike all other reference types, there is 
an operation other than assignment that changes the identity of the 
array. It is the length change operation. But what is *really bad*, is 
that it is not "deterministic"[1] whether the resulting array from a 
length change will a brand new one or not. The identity (or perhaps 
better said, the contents) can be all new, or partially the same.

[1] "deterministic" is not the 100% correct word (as it is more the 
opposite of random), but I'm missing a better term.

> A simple fix is to change the AA implementation so that AAs become MyAA* 
>  instead of aaA*[], with MyAA defined as:
> 
> struct MyAA {
>     AANode[] buckets;
>     uint numberOfNodes;
> }
> 
> You get a very slight additional lookup cost (double dereferencing to 
> get to the buckets), but make the AA a pure reference type without any 
> caveats and problems.
> 
> My questions are:
> 
> 1) Why is the D built in array type designed in such a fragile way? 
> Should D not be robust language? Do users have to resort to writing 
> library wrappers around D's types to get robustness? If so:
> 

I'd like to know that too. One more problem to the list, and yet people 
are already thinking about 1.0 and D slogans and whatnot. :/

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D