TDPL: Foreach over Unicode string
Sean Kelly
sean at invisibleduck.org
Tue Jul 27 15:33:27 PDT 2010
Andrej Mitrovic Wrote:
> On page 123 there's an example of what happens when traversing a unicode string with a char, and on the next page the string is traversed with a dchar, which should fix the output. But I'm getting different results, here's the code and output of the two samples:
>
> import std.stdio;
>
> void main() {
> string str = "Hall\u00E5, V\u00E4rld!";
> foreach (c; str) {
> write('[', c, ']');
> }
> writeln();
> }
>
> Prints:
> [H][a][l][l][Ã][¥][,][ ][V][Ã][¤][r][l][d][!]
>
> Second example:
>
> import std.stdio;
>
> void main() {
> string str = "Hall\u00E5, V\u00E4rld!";
> foreach (dchar c; str) {
> write('[', c, ']');
> }
> writeln();
> }
>
> Prints:
> [H][a][l][l][å][,][ ][V][ä][r][l][d][!]
>
>
> The second example should print out:
> [H][a][l][l][å][,][ ][V][ä][r][l][d][!]
>
> This is on DMD 2.047 on Windows.
I think it's Windows integration that's the problem, on OSX I get:
[H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
[H][a][l][l][å][,][ ][V][ä][r][l][d][!]
which is essentially correct. The only difference between this and doing the same thing in C and using printf() in place of write() is that both lines display correctly in C. I think printf() must be detecting partial UTF-8 characters and buffering until the complete chunk has arrived. Interestingly, the C output can't even be broken by badly timed calls to fflush(), so the buffering is happening at a fairly high level. I'd be interested in seeing the same thing in write() at some point.
More information about the Digitalmars-d
mailing list