Reading unicode string with readf ("%s")

Tue Nov 4 10:25:14 PST 2014

On Tuesday, 4 November 2014 at 13:01:48 UTC, anonymous wrote:
> On Monday, 3 November 2014 at 19:37:20 UTC, Ivan Kazmenko wrote:
>> Hi!
>>
>> The following code does not correctly handle Unicode strings.
>> -----
>> import std.stdio;
>> void main () {
>> 	string s;
>> 	readf ("%s", &s);
>> 	write (s);
>> }
>> -----
>>
>> Example input ("Test." in cyrillic):
>> -----
>> Тест.
>> -----
>> (hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
>>
>> Example output:
>> -----
>> Ð¢ÐµÑÑ.
>> -----
>> (hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
>>
>> Here, the input bytes are handled separately: D0 -> C3 90, A2 
>> -> C2 A2, etc.
>>
>> On the bright side, reading the file with readln works 
>> properly.
>>
>> Is this an expected shortcoming of "%s"-reading a string?
>
> No.
>
>> Could it be made to work somehow?
>
> Yes. std.stdio.LockingTextReader is to blame:
>
> void main()
> {
>      import std.stdio;
>      auto ltr = LockingTextReader(std.stdio.stdin);
>      write(ltr);
> }
> ----
> $ echo Тест | rdmd test.d
> Ð¢ÐµÑÑ
>
> LockingTextReader has a dchar front. But it doesn't do any 
> decoding. The dchar front is really a char front.
>
>> Is it worth a bug report?
>
> Yes.
>
>> Ivan Kazmenko.

You nailed it!
Reported the bug in original form: 
https://issues.dlang.org/show_bug.cgi?id=13686
Perhaps your reduction would be useful.