Robustness of the built in associative array
Oskar Linde
oskar.lindeREM at OVEgmail.com
Fri Mar 24 06:16:54 PST 2006
Hello,
The latest "Implicit fn template instantiation" thread contains code
that contains a hidden and possibly hard to find bug. Rewriting the code
with a non-templated version for clarity:
void increment(int[char[]] map, char[] key) {
if (auto t = (key in map))
(*t)++;
else
map[key] = 1;
}
To understand the code, one first has to know if the associative array
is a value type or a reference type.
It turns out it is neither. It is definitely not a value type. The above
function will in most cases alter the original associative array and
work as expected. But it is not a reference type either -- in the few
cases where adding a new key to the AA invokes a rehash, all hell breaks
loose. The original associative array is suddenly *corrupted*! (The way
to make the function work as expected is to make the map an inout
argument, but that is beside my point.)
Internally, an AA is implemented as an aaA*[] -- a dynamic array of
pointers to aaA-nodes. The first node of the array contains metadata
(the total number of keys), and the remaining [1..$] nodes contains the
hash buckets. When passing an aa to a function, the aaA*[] gets passed.
I assume the reason for this implementations is efficiency. All you need
to do an efficient lookup is the array of buckets. But is this
implementation comes at a cost of robustness. To make use of AA's, you
have to follow a special rule:
- Never add an element to or rehash the AA unless you know you hold the
only reference.
This means that even though the AA behaves as a reference type, the data
can not be shared unless all references are readonly. I feel I must see
this bastardization of being neither value or reference type a design
mistake.
A simple fix is to change the AA implementation so that AAs become MyAA*
instead of aaA*[], with MyAA defined as:
struct MyAA {
AANode[] buckets;
uint numberOfNodes;
}
You get a very slight additional lookup cost (double dereferencing to
get to the buckets), but make the AA a pure reference type without any
caveats and problems.
My questions are:
1) Why is the D built in array type designed in such a fragile way?
Should D not be robust language? Do users have to resort to writing
library wrappers around D's types to get robustness? If so:
2) Why even keep the builtin AA? With the addition of an opIn()
overload, you could get *exactly* the same functionality from a library
implementation:
Map!(char[],int) map;
rather than:
int[char[]] map;
Cons:
- This slight syntax difference.
Pros:
- The library version would be transparently replaceable by other
implementations when the data or other conditions have different
algorithmic requirements.
- The library implementation could easily be more efficient than the
current built in version by taking advantage of templates.
- The library implementation could without problems be made to work
with types that lack opCompare.
- The library version would not have any strange quirks and gotchas.
(Dangerous usage cases like above)
- Greater library/language decoupling.
- The possibility of implementing an UnsafeMap type with high
performance but less safety for the few cases where the performance
really matters.
In my opinion, D's lack of protection (const arrays, etc) and robustness
against programming mistakes is D's greatest weakness.
My contribution to the slogan thread:
"D - Programming without a lifeline" :)
/Oskar
More information about the Digitalmars-d
mailing list