TDPL: Foreach over Unicode string

Shin Fujishiro rsinfu at gmail.com
Thu Jul 29 20:52:04 PDT 2010


Andrej Mitrovic <andrej.mitrovich at gmail.com> wrote:
> You are right about printf(), I'm getting the correct output with this code:
> 
> import std.stdio, std.stream;
> 
> void main() {
>     string str = "Hall\u00E5, V\u00E4rld!";
>     foreach (dchar c; str) {
>         printf("%c", c);
>     }
>     writeln();
> }
> 
> Hallå, Värld!

The reason why printf printed the correct characters is probably that
the console was working in Windows-1257 (variant of ISO-8859-1).

ISO-8859-1 (aka Latin-1) coded character set is compatible with Unicode.
For example, Latin-1 0xE5 corresponds to U+00E5 and both represents the
character å.  Due to this fact, your console could _occasionally_ print
Latin-1 compatible Unicode characters.

The reason that Sean saw õ and Õ was that the console worked in CP850,
I believe.  In CP850 coded character set, 0xE4 = õ and 0xE5 = Õ.

D/Phobos works in Unicode, but system (console) works in a different
codeset.  As Kagamin pointed out, Phobos must transcode Unicode to
system native codeset to correctly print characters (even on linux).

By the way, I'm working on this problem in a devel branch:

  http://www.dsource.org/projects/phobos/browser/branches/devel/stdio-native-codeset/

Native codeset transcoder (std/internal/stdio/nativechar.d) is done.
Now I'm thinking on how to integrate conversion facility to the stdio
File framework.


Shin


More information about the Digitalmars-d mailing list