TDPL: Foreach over Unicode string

Sean kelly sean at invisibleduck.org
Tue Jul 27 22:17:52 PDT 2010


After a bit more research, the situation is a bit more complicated than I realized.  First, if I compile this C app using DMC:

#include <stdio.h>

int main()
{
    printf( "Hall\u00E5, V\u00E4rld!" );
    return 0;
}

The output is:

Hall&#963;, V&#931;rld!

This is what I was seeing once I started messing with std.stdio.  An improvement I suppose, since it's not garbage, but the output it still incorrect if you're expecting Unicode.  After a bit of experimenting, it looks like there are two ways to output non-ASCII correctly in Windows: convert to a multi-byte string (toMBSz) or call WriteConsoleW.  Here's a test app and the associated output.  Notice how writeln() has the same output as printf(unicodeString).

import std.stdio;
import std.string;
import std.utf;
import std.windows.charset;
import core.sys.windows.windows;

void main()
{
    HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD ignore;
    wchar[] buf = ("\u00E5 \u00E4"w).dup;

    writeln(buf);
    printf("%s\n", toStringz(toUTF8(buf)));
    printf("%s\n", toMBSz(toUTF8(buf), 1));
    WriteConsoleW(h, buf.ptr, buf.length, &ignore, null);
}

prints:

&#9500;Ñ &#9500;ñ
&#9500;Ñ &#9500;ñ
å ä
å ä

I'd think it should be enough to have std.stdio call the wide char output routine to have things display correctly, but I tried that and that's when I got the sigma.  Figuring out what's going on there will take some more work, and the ultimate fix may end up being in the DMC libraries... I really don't know.


More information about the Digitalmars-d mailing list