Reading unicode string with readf ("%s")

anonymous via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Nov 4 05:01:47 PST 2014


On Monday, 3 November 2014 at 19:37:20 UTC, Ivan Kazmenko wrote:
> Hi!
>
> The following code does not correctly handle Unicode strings.
> -----
> import std.stdio;
> void main () {
> 	string s;
> 	readf ("%s", &s);
> 	write (s);
> }
> -----
>
> Example input ("Test." in cyrillic):
> -----
> Тест.
> -----
> (hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
>
> Example output:
> -----
> Тест.
> -----
> (hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
>
> Here, the input bytes are handled separately: D0 -> C3 90, A2 
> -> C2 A2, etc.
>
> On the bright side, reading the file with readln works properly.
>
> Is this an expected shortcoming of "%s"-reading a string?

No.

> Could it be made to work somehow?

Yes. std.stdio.LockingTextReader is to blame:

void main()
{
      import std.stdio;
      auto ltr = LockingTextReader(std.stdio.stdin);
      write(ltr);
}
----
$ echo Тест | rdmd test.d
ТеÑÑ

LockingTextReader has a dchar front. But it doesn't do any 
decoding. The dchar front is really a char front.

> Is it worth a bug report?

Yes.

> Ivan Kazmenko.


More information about the Digitalmars-d-learn mailing list