TDPL: Foreach over Unicode string

Sean Kelly sean at invisibleduck.org
Tue Jul 27 15:33:27 PDT 2010


Andrej Mitrovic Wrote:

> On page 123 there's an example of what happens when traversing a unicode string with a char, and on the next page the string is traversed with a dchar, which should fix the output. But I'm getting different results, here's the code and output of the two samples:
> 
> import std.stdio;
> 
> void main() {
>     string str = "Hall\u00E5, V\u00E4rld!";
>     foreach (c; str) {
>         write('[', c, ']');
>     }
>     writeln();
> }
> 
> Prints:
> [H][a][l][l][Ã][¥][,][ ][V][Ã][¤][r][l][d][!]
> 
> Second example:
> 
> import std.stdio;
> 
> void main() {
>     string str = "Hall\u00E5, V\u00E4rld!";
>     foreach (dchar c; str) {
>         write('[', c, ']');
>     }
>     writeln();
> }
> 
> Prints:
> [H][a][l][l][å][,][ ][V][ä][r][l][d][!]
> 
> 
> The second example should print out:
> [H][a][l][l][å][,][ ][V][ä][r][l][d][!] 
> 
> This is on DMD 2.047 on Windows.

I think it's Windows integration that's the problem, on OSX I get:

[H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
[H][a][l][l][å][,][ ][V][ä][r][l][d][!]

which is essentially correct.  The only difference between this and doing the same thing in C and using printf() in place of write() is that both lines display correctly in C.  I think printf() must be detecting partial UTF-8 characters and buffering until the complete chunk has arrived.  Interestingly, the C output can't even be broken by badly timed calls to fflush(), so the buffering is happening at a fairly high level.  I'd be interested in seeing the same thing in write() at some point.


More information about the Digitalmars-d mailing list