The Case For Autodecode
ag0aep6g via Digitalmars-d
digitalmars-d at puremagic.com
Fri Jun 3 12:52:13 PDT 2016
On 06/03/2016 09:09 PM, Steven Schveighoffer wrote:
> Except many chars *do* properly convert. This should work:
>
> char c = 'a';
> dchar d = c;
> assert(d == 'a');
Yeah, that's what I meant by "standalone code unit". Code units that on
their own represent a code point would not be touched.
> As I mentioned in my earlier reply, some kind of "bounds checking" for
> the conversion could be a possibility.
>
> Hm... an interesting possiblity:
>
> dchar _dchar_convert(char c)
> {
> return cast(int)cast(byte)c; // get sign extension for non-ASCII
> }
So when the char's most significant bit is set, this fills the upper
bits of the dchar with 1s, right? And a set most significant bit in a
char means it's part of a multibyte sequence, while in a dchar it means
that the dchar is invalid, because they only go up to U+10FFFF. Huh. Neat.
Does it work for for char -> wchar, too?
More information about the Digitalmars-d
mailing list