[Issue 13686] New: Reading unicode string with readf ("%s") produces a wrong string
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Tue Nov 4 10:20:56 PST 2014
https://issues.dlang.org/show_bug.cgi?id=13686
Issue ID: 13686
Summary: Reading unicode string with readf ("%s") produces a
wrong string
Product: D
Version: D2
Hardware: x86_64
OS: Windows
Status: NEW
Severity: enhancement
Priority: P1
Component: DMD
Assignee: nobody at puremagic.com
Reporter: gassa at mail.ru
The following code does not correctly handle Unicode strings.
-----
import std.stdio;
void main () {
string s;
readf ("%s", &s);
writeln (s.length);
write (s);
}
-----
Example input ("Test." in cyrillic):
-----
Тест.
-----
(hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)
That is 11 bytes (with '\n'=CR/LF being two bytes on Windows).
Example output:
-----
18
ТеÑÑ.
-----
(hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)
The second line is 19 bytes (again with '\n'=CR/LF being two bytes on Windows).
The reported length (18 counting '\n' as one character - instead of the
expected length of 10) ensures that the problem is in reading, not in writing.
Here, the input bytes are handled separately: D0 -> C3 90, A2 -> C2 A2, etc.
On the bright side, reading the file with readln works properly.
Relevant discussion:
http://forum.dlang.org/thread/rblxsxrdhjtkmxugyvrf@forum.dlang.org
--
More information about the Digitalmars-d-bugs
mailing list