TDPL: Foreach over Unicode string
Shin Fujishiro
rsinfu at gmail.com
Thu Jul 29 20:52:04 PDT 2010
Andrej Mitrovic <andrej.mitrovich at gmail.com> wrote:
> You are right about printf(), I'm getting the correct output with this code:
>
> import std.stdio, std.stream;
>
> void main() {
> string str = "Hall\u00E5, V\u00E4rld!";
> foreach (dchar c; str) {
> printf("%c", c);
> }
> writeln();
> }
>
> Hallå, Värld!
The reason why printf printed the correct characters is probably that
the console was working in Windows-1257 (variant of ISO-8859-1).
ISO-8859-1 (aka Latin-1) coded character set is compatible with Unicode.
For example, Latin-1 0xE5 corresponds to U+00E5 and both represents the
character å. Due to this fact, your console could _occasionally_ print
Latin-1 compatible Unicode characters.
The reason that Sean saw õ and Õ was that the console worked in CP850,
I believe. In CP850 coded character set, 0xE4 = õ and 0xE5 = Õ.
D/Phobos works in Unicode, but system (console) works in a different
codeset. As Kagamin pointed out, Phobos must transcode Unicode to
system native codeset to correctly print characters (even on linux).
By the way, I'm working on this problem in a devel branch:
http://www.dsource.org/projects/phobos/browser/branches/devel/stdio-native-codeset/
Native codeset transcoder (std/internal/stdio/nativechar.d) is done.
Now I'm thinking on how to integrate conversion facility to the stdio
File framework.
Shin
More information about the Digitalmars-d
mailing list