Reading unicode string with readf ("%s")
    Ivan Kazmenko via Digitalmars-d-learn 
    digitalmars-d-learn at puremagic.com
       
    Mon Nov  3 11:37:18 PST 2014
    
    
  
Hi!
The following code does not correctly handle Unicode strings.
-----
import std.stdio;
void main () {
	string s;
	readf ("%s", &s);
	write (s);
}
-----
Example input ("Test." in cyrillic):
-----
Тест.
-----
(hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
Example output:
-----
ТеÑÑ.
-----
(hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
Here, the input bytes are handled separately: D0 -> C3 90, A2 -> 
C2 A2, etc.
On the bright side, reading the file with readln works properly.
Is this an expected shortcoming of "%s"-reading a string?
Could it be made to work somehow?
Is it worth a bug report?
Ivan Kazmenko.
    
    
More information about the Digitalmars-d-learn
mailing list