[Issue 10668] Unicode characters, when taken from strings (as char), are not printed correctly
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Jul 19 02:41:49 PDT 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668
monarchdodra at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |monarchdodra at gmail.com
Resolution| |INVALID
--- Comment #2 from monarchdodra at gmail.com 2013-07-19 02:41:47 PDT ---
Well... what did you think it was going to print? you have a utf-8 sequence.
char c = s[0]; will extract the first code*point* of your unicode. You want the
first code*unit*.
http://www.fileformat.info/info/unicode/char/a3/index.htm
EG: £ is the codepoint "AE"
In UTF8 it is represented by the sequence: [0xC2, 0xA3]
When you write "char c = s[0];", you are extracting the first codeunit, which
is 0xC2. When you pass this to to writeln, what will happen will mostly depend
on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will
print the "unknown character", since it you passed an incomplete sequence.
The correct code you want is:
dchar c = s.front;
(remember to include std.array to front).
Another alternative, is to simply work from the ground up with dstrings.
module main;
import std.stdio;
void main(string[] args) {
dstring s = "£££";
writeln(s); // Output: £££
dchar c = s[0];
writeln(c); // Output: £
writeln(s[0]); // Output: £
}
Do you have access to "The D Programming Language"? It has the best
introduction to unicode/UTF I've read.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list