AA key conversion woes

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Apr 20 08:07:35 PDT 2012


On Fri, Apr 20, 2012 at 02:07:05PM +0200, SomeDude wrote:
> On Thursday, 19 April 2012 at 19:31:36 UTC, H. S. Teoh wrote:
[...]
> >Believe me, I would rather not have immutable keys if I could help
> >it, because it makes things so much simpler. However:
> >
> >(1) It leads to subtle breakages due to unintended aliasing:
> >
> >	int[int[]] aa;
> >	int[] key1 = [1,2,3];
> >	aa[key1] = 123;
> >	key1[0] = 2;	// uh oh!
> >	assert(aa[[1,2,3]] == 123);	// assertion fails, key has
> >					// changed without AA's knowledge
> >
> 
> Yeah, I thought about that, but then it's the programmer's
> responsibility to not shoot himself in the foot, and if he wants to be
> safe, he can still declare his keys immutable from the start. My
> concern is imposing a useless copy upon the programmer who has mutable
> keys for the sake of using the AA. There should be zero overhead. If
> for some reason he is forced to do that, he will end up writing his
> own implementation.

Copying is only done when actually adding a new key to the AA. For
example, since char[], const(char)[], and string (==immutable(char)[])
are all mutually comparable, nothing needs to be copied until the key
needs to be put inside the container, at which point it seems reasonable
to copy it. Pure lookups and stuff have no overhead.

Furthermore, the programmer is not forced to have the copy. There's
always cast(immutable) to forcefully coerce the key into immutable so
that no copy is ever done. Yes, it does bypass the type system, but
that's what the programmer signed up for when he decided to use mutable
keys.


> >(2) It tickles some compiler bugs/language ambiguities that causes
> >things like this to not work as expected:
> >
> >	int[char[]] aa;
> >	aa["abc"] = 123;	// compile error: cannot convert string to
> >char[]
> >	const(char)[] key1 = "abc";	// works
> >	aa[key1] = 123;		// compile error: cannot convert const(char)[]
> >to char[]
> >
> 
> OK, aa["abc".dup] = 123 works, though.

Which actually introduces overhead, since if the AA member functions
take const, the compiler will actually automatically infer the type of
"abc" as const(char)[] and so there is no .dup overhead at all.


> I don't know if there is a better way to do this. Anyway, I agree it's
> super ugly code, but how many times does the program do this copy ?

If you have the .dup, then it happens every time you do the lookup of a
literal. The correct solution is to have the compiler infer the type of
the literal correctly.


> Once at the AA initialization (because you are not going to insert
> millions of string litterals by hand in your code).

It's not just initialization, it's every time you lookup an entry by a
literal key.


> How many times does it need to copy the other keys ? Zero. OTOH, if
> you force the keys to be immutable, your string litteral will be nicer
> looking, but you may force a million copies of char[] to make them
> immutable keys (most often, the keys will be determined by the
> incoming data, not your string litterals).

You could just create immutable keys in the first place.


> Performance and memory wise, it's very bad, and when the programmer
> sees there is no possible workaround, he won't be happy.

Use cast(immutable) if you absolutely have to use mutable keys.


> >Having immutable keys (or at least const keys) allows the AA
> >implementation to specify const(keytype) in opIndex's parameters, so
> >that you can pass in char[], const(char)[], or string, and it will
> >work as expected.
> 
> Yeah, but that's a "comfort" for initialization, when you have to
> deal with the million real data, you'll force a duplication, and
> that's bad.

No, it happens with lookups too.


> Worse, if after that, the programmer needs to get .keys, he will have
> to perform another cast on all the elements of the array to remove the
> immutable afterwards because he wants to use the keys with the rest of
> his code which deals with mutable data.
[...]

Mutating keys returned by .keys is bad, because it breaks the AA. For
example:

	int[char[]] aa;
	char[] key = [1,2,3];
	aa[key] = 123;
	auto kk = aa.keys;
	kk[0][0] = 2;
	// Now AA is broken, the entry cannot be looked up any more
	// because the internally computed hash value is now wrong and
	// will never match the key again. It's now a ghost entry: you
	// can't look it up or delete it, and adding the same key will
	// create a duplicate entry. Sortof like a memory leak.

To have correct behaviour here, you *need* to duplicate the keys. So you
*want* the type system to enforce this here.

Obviously, if the key is returned by value, then it shouldn't be
immutable, since changing a copy is harmless. But reference types
definitely should be immutable. Otherwise you'll get tons of bugs from
newbies who inadvertently change the keys and wonder why the AA stopped
working.


T

-- 
A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White


More information about the Digitalmars-d mailing list