Associative array issue

H. S. Teoh hsteoh at quickfur.ath.cx
Wed Jan 23 12:21:34 PST 2013


On Wed, Jan 23, 2013 at 09:07:24PM +0100, Igor Kolesnik wrote:
[...]
> import std.stdio, std.string;
> 
> void main() {
>   uint[string] dic;
>   foreach (line; stdin.byLine) {
>     string[] words = cast(string[])split(strip(line));
>     foreach (word; words) {
>       if (word in dic)
> 	continue;
>       uint id = dic.length;
>       dic[word] = id;
>       writeln(id, '\t', word);
>     }
>   }
>   //foreach (k,v; dic)
>   //  writeln(k, '|', v);
> }
> 
> When run it behaves somehow strange. Here is an example of the
> input/output I get
[...]

This is a known issue with stdin.byLine: it is a transient range (that
means it reuses the same buffer for each line read from the input). The
problem with this is that split returns slices of the line, that
ultimately refer back to the data in the buffer. But by the time byLine
is called again, that data has been overwritten. That's why the
associative array is messed up.

There's a slight hint of this problem in your code that starts with
"string[] words = cast(string[])..." -- in normal D code, you should not
need to perform this kind of casting. In this case, this is an unsafe
operation, because string is immutable(char)[], but the reused buffer
returned by byLine is *not* immutable, so by casting away immutable,
you've inadvertently introduced yourself to the buffer reuse issue in
byLine. :)

The correct way to write that line is:

	string[] words = split(strip(line.idup));

which will copy the buffer, thereby ensuring it's safe to keep slices of
it in your associative array, and also return the correct type so that
no cast is necessary.


T

-- 
Notwithstanding the eloquent discontent that you have just respectfully expressed at length against my verbal capabilities, I am afraid that I must unfortunately bring it to your attention that I am, in fact, NOT verbose.


More information about the Digitalmars-d-learn mailing list