string comparison

Jonathan M Davis jmdavisProg at gmx.com
Mon Dec 20 11:31:14 PST 2010


On Monday, December 20, 2010 10:44:12 doubleagent wrote:
> > Are you 100% sure that you are running this version
> 
> I have to be.  There are no other versions of phobos on this box and 'which
> dmd' points to the correct binary.
> 
> >  dictionary[word.idup] = newId;
> 
> That fixes it.
> 
> > The 'word' array is mutable and reused by byLine() on each iteration.  By
> > doing the above you use an immutable copy of it as the key instead.
> 
> I REALLY don't understand this explanation.  Why does the mutability of
> 'word' matter when the associative array 'dictionary' assigns keys by
> value...it's got to assign them by value, right?  Otherwise we would only
> get one entry in 'dictionary' and the key would be constantly changing.

Okay. I don't know what the actual code looks like, but word is obviously a 
dynamic array, and if it's from byLine(), then that dynamic array is mutable - 
both the array itself and its elements. Using idup gets you an immutable copy. 
When copying dynamic arrays, you really get a slice of that array. So, you get 
an array that points to the same array as the original. Any changes to the 
elements in one affects the other. If you append to one of them and it doesn't 
have the space to resize in place or dyou o anything else which could cause it 
to reallocate, then that array is reallocated and they no longer point to the 
same data and changing will not change the other.

If the elements of the array are const or immutable, then the fact that the two 
arrays point to the same data isn't a problem because the elements can't be 
changed (except in cases where you'red dealing with const rather than immutable 
and another array points to the same data but doesn't have const elements). So, 
assigning one string to another, for instance (string being an alias for 
immutable(char)[]), will never result in one string altering another. However, 
if you're dealing with char[] rather than string, one array _can_ affect the 
elements of another. I believe that byLine() deals with a char[], not a string.

Now, as for associative arrays, they don't really deal with const correctly. I 
believe that they're actually implemented with void* and you can actually do 
things like put const elements in them in spite of the fact that toHash() on 
Object is not currently const (there is an open bug on the fact that Object is 
not const-correct). So, it does not surprise me in the least if it will take 
mutable types as its key and then allow them to be altered (assuming that 
they're pointers or reference types and you can therefore have other references 
to them). But to fix the problem in this case would require immutability rather 
than const, because you're dealing with a reference type (well, pseudo-reference 
type since dynamic arrays share their elements such that changes to their 
elements affect all arrays which point to those elements, but other changes - 
such as altering their length don't affect other arrays and will even likely 
result in the arrays then being completely separate).

> The behavior itself seems really unpredictable prior to testing, and really
> unintended after testing.  I suspect it's due to some sort of a bug.  The
> program, on my box anyway, only fails when we give it identical strings,
> except one is prefixed with a space.  That should tell us that 'splitter'
> and 'strip' didn't do their job properly.  The fly in the ointment is that
> when we output the strings, they appear as we would expect.
> 
> I suspect D does string comparisons (when the 'in' keyword is used) based
> on some kind of a hash, and that hash doesn't get correctly updated when
> 'strip' or 'splitter' is applied, or upon the next comparison or whatever.
>  Calling 'idup' must force the hash to get recalculated.  Obviously, you
> guys would know if there's any merit to this, but it seems to explain the
> problem.

in should use toHash() (or whatever built-in functions for built-in types if 
you're not dealing with a struct or class) followed by ==. I'd be stunned if 
there were any caching involved. The problem is that byLine() is using a mutable 
array, so the elements pointed to by the array that you just put in the 
associative array changed, which means that the hash for them is wrong, and == 
will fail when used to compare the array to what it was before.

> > The advantage with splitter is that it is lazy and therefore more
> > efficient.  split() is eager and allocates memory to hold the string
> > fragments.
> 
> Yeah, that's what I thought would be the answer.  Kudos to you guys for
> thinking of laziness out of the box.  This is a major boon for D.
> 
> You know, there's something this touches on which I was curious about.  If
> D defaults to 'safety first', and with some work you can get
> down-to-the-metal, why doesn't the language default to immutable
> variables, with an explicit modifier for mutable ones?  C compatibility?

C compatability would be one reason. Familiarity would be another. Also, it 
would be _really_ annoying to have to mark variables mutable all over the place 
as you would inevitably have to do. The way that const and immutable are 
designed in D, to some extent, you can pretty much ignore them if you don't want 
to use them, which some folks like Andrei deem important. Making immutable the 
default would force it on everyone.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list