[Issue 10668] Unicode characters, when taken from strings (as char), are not printed correctly

Fri Jul 19 08:24:58 PDT 2013

http://d.puremagic.com/issues/show_bug.cgi?id=10668

--- Comment #4 from Matt Carter <MATTCA at sky.com> 2013-07-19 08:24:57 PDT ---
(In reply to comment #2)
> Well... what did you think it was going to print? you have a utf-8 sequence.
> char c = s[0]; will extract the first code*point* of your unicode. You want the
> first code*unit*.
> 
> http://www.fileformat.info/info/unicode/char/a3/index.htm
> EG: £ is the codepoint "AE"
> In UTF8 it is represented by the sequence: [0xC2, 0xA3]
> 
> When you write "char c = s[0];", you are extracting the first codeunit, which
> is 0xC2. When you pass this to to writeln, what will happen will mostly depend
> on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will
> print the "unknown character", since it you passed an incomplete sequence.
> 
> The correct code you want is:
> dchar c = s.front;
> 
> (remember to include std.array to front).
> 
> Another alternative, is to simply work from the ground up with dstrings.
> 
> module main;
> 
> import std.stdio;
> 
> void main(string[] args) {
>     dstring s = "£££";
>     writeln(s); // Output: £££
> 
>     dchar c = s[0];
>     writeln(c); // Output: £
> 
>     writeln(s[0]); // Output: £
> }
> 
> Do you have access to "The D Programming Language"? It has the best
> introduction to unicode/UTF I've read.

Thanks for the response! Yeah, I converted my project to use dstrings on the
off chance it worked after posting, lo-behold this is the fix it seems.

I plan on eventually getting the book, although I've read some bad reviews
regarding the e-book/kindle version, so I'm having to wait a little longer to
get a hard copy.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------