Implicit encoding conversion on string ~= int ?
Adam D. Ruppe
destructionator at gmail.com
Sun Jun 23 10:32:29 PDT 2013
On Sunday, 23 June 2013 at 17:12:41 UTC, Marco Leise wrote:
> int b = 228; // CP850 value for 'ä'. Note: fits in a single
> byte!
228 (e4 in hex) is also the Unicode code point for ä, which is
[195, 164] when encoded as UTF-8. see:
http://www.utf8-chartable.de/unicode-utf8-table.pl?number=512&utf8=dec
While the number 228 would fit in a byte normally, utf-8 uses the
high bits as markers that this is part of a multibyte sequence
(this helps with ascii compatibility), so any code point > 127
will always be a multibyte sequence in utf-8. see:
http://en.wikipedia.org/wiki/UTF-8#Description
More information about the Digitalmars-d
mailing list