Reading dchar from UTF-8 stdin

Ali Çehreli acehreli at yahoo.com
Tue Mar 15 15:33:18 PDT 2011


Given that the input stream is UTF-8, it is understandable that the 
following program pulls just one code unit from the standard input (I 
think the console encoding is UTF-8 on my Ubuntu 10.10):

import std.stdio;

void main()
{
     char code;
     readf(" %s", &code);
     writeln(code);       // <-- may write an incomplete character
}

ö is represented by two bytes in the UTF-8 encoding. When ö is fed to 
the input of the program, writeln expression does not produce a complete 
character on the output. That's understandable with char.

Would you expect all of the bytes to be consumed when a dchar was used 
instead?

import std.stdio;

void main()
{
     dchar code;          // <-- now a dchar
     readf(" %s", &code);
     writeln(code);       // <-- BUG: uses a code unit as a code point!
}

When the input is ö, now the output becomes Ã.

What would you expect to happen?

Ali

P.S. As what is written is not the same as what is read above, I am 
reminded of another issue: would you expect the strings "false" and 
"true" to be accepted as correct inputs when readf'ed to bool variables?


More information about the Digitalmars-d mailing list