Reading unicode chars..

Ali Çehreli via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Sep 2 10:10:55 PDT 2014


On 09/02/2014 07:06 AM, seany wrote:
> How do I read unicode chars that has code points \u1FFF and higher from
> a file?
>
> file.getcw() reads only part of the char, and D identifies this
> character as an array of three or four characters.
>
> Importing std.uni does not change the behavior.
>
> Thank you.

One way is to use std.stdio.File just like you would use stdin and stdout:

import std.stdio;

void main()
{
     string fileName = "unicode_test_file";
     doWrite(fileName);
     doRead(fileName);
}

void doWrite(string fileName)
{
     auto file = File(fileName, "w");
     file.writeln("abcçdef");
}

void doRead(string fileName)
{
     auto file = File(fileName, "r");

     foreach (line; file.byLine) {        // (1)
         foreach (dchar c; line) {        // (2)
             writeln(c);
         }

         import std.range;
         foreach (c; line.stride(1)) {    // (3)
             writeln(c);
         }
     }
}

Notes:

1) To avoid a common gotcha, note that 'line' is reused at every 
iteration here. You must make copies of portions of it if you need to.

2) dchar is important there

3) Any algorithms that turns a string to a range does expose decoded 
dchars. Here, I used stride.

Ali



More information about the Digitalmars-d-learn mailing list