[Issue 10668] Unicode characters, when taken from strings (as char), are not printed correctly
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Jul 19 08:24:58 PDT 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10668
--- Comment #4 from Matt Carter <MATTCA at sky.com> 2013-07-19 08:24:57 PDT ---
(In reply to comment #2)
> Well... what did you think it was going to print? you have a utf-8 sequence.
> char c = s[0]; will extract the first code*point* of your unicode. You want the
> first code*unit*.
>
> http://www.fileformat.info/info/unicode/char/a3/index.htm
> EG: £ is the codepoint "AE"
> In UTF8 it is represented by the sequence: [0xC2, 0xA3]
>
> When you write "char c = s[0];", you are extracting the first codeunit, which
> is 0xC2. When you pass this to to writeln, what will happen will mostly depend
> on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will
> print the "unknown character", since it you passed an incomplete sequence.
>
> The correct code you want is:
> dchar c = s.front;
>
> (remember to include std.array to front).
>
> Another alternative, is to simply work from the ground up with dstrings.
>
> module main;
>
> import std.stdio;
>
> void main(string[] args) {
> dstring s = "£££";
> writeln(s); // Output: £££
>
> dchar c = s[0];
> writeln(c); // Output: £
>
> writeln(s[0]); // Output: £
> }
>
> Do you have access to "The D Programming Language"? It has the best
> introduction to unicode/UTF I've read.
Thanks for the response! Yeah, I converted my project to use dstrings on the
off chance it worked after posting, lo-behold this is the fix it seems.
I plan on eventually getting the book, although I've read some bad reviews
regarding the e-book/kindle version, so I'm having to wait a little longer to
get a hard copy.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list