TDPL: Foreach over Unicode string
Sean Kelly
sean at invisibleduck.org
Tue Jul 27 16:43:42 PDT 2010
Andrej Mitrovic Wrote:
> On Wed, Jul 28, 2010 at 12:34 AM, Sean Kelly <sean at invisibleduck.org> wrote:
>
> > Sean Kelly Wrote:
> > >
> > > I think it's Windows integration that's the problem, on OSX I get:
> > >
> > > [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
> > > [H][a][l][l][å][,][ ][V][ä][r][l][d][!]
> > >
> > > which is essentially correct. The only difference between this and doing
> > the same thing in C and using printf() in place of write() is that both
> > lines display correctly in C. I think printf() must be detecting partial
> > UTF-8 characters and buffering until the complete chunk has arrived.
> > Interestingly, the C output can't even be broken by badly timed calls to
> > fflush(), so the buffering is happening at a fairly high level. I'd be
> > interested in seeing the same thing in write() at some point.
> >
> > Ah, write() already works that way. It was the brackets that were screwing
> > things up.
> >
>
> You are right about printf(), I'm getting the correct output with this code:
>
> import std.stdio, std.stream;
>
> void main() {
> string str = "Hall\u00E5, V\u00E4rld!";
> foreach (dchar c; str) {
> printf("%c", c);
> }
> writeln();
> }
>
> Hallå, Värld!
>
> Should I file this as a Windows bug for DMD?
Yes. I looked into this briefly, and after a bit of googling, it looks like fwide() isn't implemented on Windows (unless Walter had done this himself in the DMC libraries). See here:
http://blogs.msdn.com/b/michkap/archive/2009/06/23/9797156.aspx
If I change std.stdio.LockingTextWriter.put(C)(C c) to always use the version(Windows) code for a 32-bit argument it *almost* works correctly. Instead of garbage, the Unicode characters are a lowercase o with an accent above (U+01A1 I believe) and an uppercase sigma (U+01A9). I'll have to spend some more time later trying to figure out why it's these characters and not the intended ones. I wouldn't think that endian issues should be relevant, but that's the only thing I've come up with so far.
More information about the Digitalmars-d
mailing list