AA's and mutating keys

Jarrett Billingsley kb3ctd2 at yahoo.com
Wed Oct 31 16:42:54 PDT 2007


"Steven Schveighoffer" <schveiguy at yahoo.com> wrote in message 
news:fgaf4j$1d18$1 at digitalmars.com...
> It's ok if the AA makes a copy of the key, or uses something based on the 
> key (like a hashcode).

Not really.  Consider:

int[char[]] map;
char[] key = new char[5];
key[] = "hello"[];
map[key] = 8;
key[0] = 'c';
map["cello"] = 2;

What happens is that you insert ("hello" => 8) into the table.  But then you 
go and modify the key so it's "cello" instead.  However now you have an 
inconsistent key: the key says "cello" but its hash is the hash of "hello". 
So, when you try to change "cello", the new "cello" maps to another slot and 
("cello" => 2) is inserted.  In fact, if you run this code and then print 
out the key value pairs, you'll get something like:

cello 8
cello 2

In fact it's now impossible to modify or remove the original key-value pair 
because even if you can get something to hash to the right slot, the keys 
won't compare equal, and so the AA implementation won't find it.

> Let's say you were counting the number of occurrences of each word in a 
> file.  You may do it by reading lines into a buffer.  Then for each word 
> in the line, increment a counter in an AA that uses strings as the key and 
> integers as the value.
>
> If you re-use the buffer, you might overwrite the key that you used to 
> insert an element into an AA!  But it's not obvious to someone who 
> normally codes in C++ or Java, because strings are either pass by value or 
> immutable. I've used D for a few months now, and it isn't even obvious to 
> me :)
>
> I am running into this very situation, so I want to know whether an AA 
> will dup the key automatically or if I have to do it myself.
>
> Plus, if I have to dup the key every time I just increment the count (not 
> the first insertion), that is a lot of wasted memory and copying.  So I 
> have to do something like:
>
> int *x = key in map;
> if(x)
>  (*x)++;
> else
>  map[key.dup] = 1;
>
> which would be nice if it was done automatically for me :)

Easy, nested functions to the rescue:

char[80] buffer;
int[char[]] map;

void inc(char[] key)
{
    int* x = key in map;

    if(x)
        (*x)++;
    else
        map[key.dup] = 1;
}

foreach(thing; input)
{
    fill buffer;
    inc(buffer[0 .. end]);
}

The nice thing about it not doing it automatically for you is performance. 
Most of the time you _don't_ need this behavior, and when you do it's easy 
to implement it on top of the existing behavior.

> Also, if I do have to dup it myself, then the website should specify that 
> keys should not be modified once they are used to insert nodes into an AA.

Agreed. 




More information about the Digitalmars-d-learn mailing list