[Issue 10668] Unicode characters, when taken from strings (as char), are not printed correctly

Fri Jul 19 02:41:49 PDT 2013

http://d.puremagic.com/issues/show_bug.cgi?id=10668

monarchdodra at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |monarchdodra at gmail.com
         Resolution|                            |INVALID

--- Comment #2 from monarchdodra at gmail.com 2013-07-19 02:41:47 PDT ---
Well... what did you think it was going to print? you have a utf-8 sequence.
char c = s[0]; will extract the first code*point* of your unicode. You want the
first code*unit*.

http://www.fileformat.info/info/unicode/char/a3/index.htm
EG: £ is the codepoint "AE"
In UTF8 it is represented by the sequence: [0xC2, 0xA3]

When you write "char c = s[0];", you are extracting the first codeunit, which
is 0xC2. When you pass this to to writeln, what will happen will mostly depend
on your locale/codepage. If it is set to UF8 (CP65001 on windows), then it will
print the "unknown character", since it you passed an incomplete sequence.

The correct code you want is:
dchar c = s.front;

(remember to include std.array to front).

Another alternative, is to simply work from the ground up with dstrings.

module main;

import std.stdio;

void main(string[] args) {
    dstring s = "£££";
    writeln(s); // Output: £££

    dchar c = s[0];
    writeln(c); // Output: £

    writeln(s[0]); // Output: £
}

Do you have access to "The D Programming Language"? It has the best
introduction to unicode/UTF I've read.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------