Reading unicode string with readf ("%s")
Ivan Kazmenko via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue Nov 4 10:25:14 PST 2014
On Tuesday, 4 November 2014 at 13:01:48 UTC, anonymous wrote:
> On Monday, 3 November 2014 at 19:37:20 UTC, Ivan Kazmenko wrote:
>> Hi!
>>
>> The following code does not correctly handle Unicode strings.
>> -----
>> import std.stdio;
>> void main () {
>> string s;
>> readf ("%s", &s);
>> write (s);
>> }
>> -----
>>
>> Example input ("Test." in cyrillic):
>> -----
>> Тест.
>> -----
>> (hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
>>
>> Example output:
>> -----
>> ТеÑÑ.
>> -----
>> (hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
>>
>> Here, the input bytes are handled separately: D0 -> C3 90, A2
>> -> C2 A2, etc.
>>
>> On the bright side, reading the file with readln works
>> properly.
>>
>> Is this an expected shortcoming of "%s"-reading a string?
>
> No.
>
>> Could it be made to work somehow?
>
> Yes. std.stdio.LockingTextReader is to blame:
>
> void main()
> {
> import std.stdio;
> auto ltr = LockingTextReader(std.stdio.stdin);
> write(ltr);
> }
> ----
> $ echo Тест | rdmd test.d
> ТеÑÑ
>
> LockingTextReader has a dchar front. But it doesn't do any
> decoding. The dchar front is really a char front.
>
>> Is it worth a bug report?
>
> Yes.
>
>> Ivan Kazmenko.
You nailed it!
Reported the bug in original form:
https://issues.dlang.org/show_bug.cgi?id=13686
Perhaps your reduction would be useful.
More information about the Digitalmars-d-learn
mailing list