The Case Against Autodecode

Timon Gehr via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 15:11:30 PDT 2016


On 02.06.2016 23:56, Walter Bright wrote:
> On 6/2/2016 1:12 PM, Timon Gehr wrote:
>> ...
>> It is not
>> meaningful to compare utf-8 and utf-16 code units directly.
>
> Yes, you have a good point. But we do allow things like:
>
>     byte b;
>     if (b == 10000) ...
>

Well, this is a somewhat different case, because 10000 is just not 
representable as a byte. Every value that fits in a byte fits in an int 
though.

It's different for code units. They are incompatible both ways. E.g. 
dchar obviously does not fit in a char, and while the lower half of char 
is compatible with dchar, the upper half is specific to the encoding. 
dchar cannot represent upper half char code units. You get the code 
points with the corresponding values instead.

E.g.:

void main(){
     import std.stdio,std.utf;
     foreach(dchar d;"ö".byCodeUnit)
         writeln(d); // "Ã", "¶"
}



More information about the Digitalmars-d mailing list