AA's and mutating keys
Jarrett Billingsley
kb3ctd2 at yahoo.com
Wed Oct 31 16:42:54 PDT 2007
"Steven Schveighoffer" <schveiguy at yahoo.com> wrote in message
news:fgaf4j$1d18$1 at digitalmars.com...
> It's ok if the AA makes a copy of the key, or uses something based on the
> key (like a hashcode).
Not really. Consider:
int[char[]] map;
char[] key = new char[5];
key[] = "hello"[];
map[key] = 8;
key[0] = 'c';
map["cello"] = 2;
What happens is that you insert ("hello" => 8) into the table. But then you
go and modify the key so it's "cello" instead. However now you have an
inconsistent key: the key says "cello" but its hash is the hash of "hello".
So, when you try to change "cello", the new "cello" maps to another slot and
("cello" => 2) is inserted. In fact, if you run this code and then print
out the key value pairs, you'll get something like:
cello 8
cello 2
In fact it's now impossible to modify or remove the original key-value pair
because even if you can get something to hash to the right slot, the keys
won't compare equal, and so the AA implementation won't find it.
> Let's say you were counting the number of occurrences of each word in a
> file. You may do it by reading lines into a buffer. Then for each word
> in the line, increment a counter in an AA that uses strings as the key and
> integers as the value.
>
> If you re-use the buffer, you might overwrite the key that you used to
> insert an element into an AA! But it's not obvious to someone who
> normally codes in C++ or Java, because strings are either pass by value or
> immutable. I've used D for a few months now, and it isn't even obvious to
> me :)
>
> I am running into this very situation, so I want to know whether an AA
> will dup the key automatically or if I have to do it myself.
>
> Plus, if I have to dup the key every time I just increment the count (not
> the first insertion), that is a lot of wasted memory and copying. So I
> have to do something like:
>
> int *x = key in map;
> if(x)
> (*x)++;
> else
> map[key.dup] = 1;
>
> which would be nice if it was done automatically for me :)
Easy, nested functions to the rescue:
char[80] buffer;
int[char[]] map;
void inc(char[] key)
{
int* x = key in map;
if(x)
(*x)++;
else
map[key.dup] = 1;
}
foreach(thing; input)
{
fill buffer;
inc(buffer[0 .. end]);
}
The nice thing about it not doing it automatically for you is performance.
Most of the time you _don't_ need this behavior, and when you do it's easy
to implement it on top of the existing behavior.
> Also, if I do have to dup it myself, then the website should specify that
> keys should not be modified once they are used to insert nodes into an AA.
Agreed.
More information about the Digitalmars-d-learn
mailing list