TDPL: Foreach over Unicode string
Sean kelly
sean at invisibleduck.org
Tue Jul 27 22:17:52 PDT 2010
After a bit more research, the situation is a bit more complicated than I realized. First, if I compile this C app using DMC:
#include <stdio.h>
int main()
{
printf( "Hall\u00E5, V\u00E4rld!" );
return 0;
}
The output is:
Hallσ, VΣrld!
This is what I was seeing once I started messing with std.stdio. An improvement I suppose, since it's not garbage, but the output it still incorrect if you're expecting Unicode. After a bit of experimenting, it looks like there are two ways to output non-ASCII correctly in Windows: convert to a multi-byte string (toMBSz) or call WriteConsoleW. Here's a test app and the associated output. Notice how writeln() has the same output as printf(unicodeString).
import std.stdio;
import std.string;
import std.utf;
import std.windows.charset;
import core.sys.windows.windows;
void main()
{
HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD ignore;
wchar[] buf = ("\u00E5 \u00E4"w).dup;
writeln(buf);
printf("%s\n", toStringz(toUTF8(buf)));
printf("%s\n", toMBSz(toUTF8(buf), 1));
WriteConsoleW(h, buf.ptr, buf.length, &ignore, null);
}
prints:
å ä
å ä
å ä
å ä
I'd think it should be enough to have std.stdio call the wide char output routine to have things display correctly, but I tried that and that's when I got the sigma. Figuring out what's going on there will take some more work, and the ultimate fix may end up being in the DMC libraries... I really don't know.
More information about the Digitalmars-d
mailing list