What is the legal range of chars?

monarch_dodra monarchdodra at gmail.com
Wed Jun 19 09:53:59 PDT 2013


On Wednesday, 19 June 2013 at 15:13:23 UTC, Ali Çehreli wrote:
> On 06/19/2013 05:34 AM, monarch_dodra wrote:
>
> > I know a "binary" char can hold the values 0 to 0xFF.
> However, I'm
> > wondering about the cases where a codepoint can fit inside a
> char. For
> > example, 'ç' is represented by 0xe7, which technically fits
> inside a char.
>
> 'ç' is represented by 0xe7 in an encoding that is not UTF-8. :)
>
> That would be a special agreement between the producer and the 
> consumer of that string. Otherwise, 0xe7 is not 'ç'. I 
> recommend ubyte[] for those cases.
>
> In UTF-8, 0xe7 is the first byte of a 3-byte code point:
>
> import std.stdio;
>
> void main()
> {
>     char[] a = [ 'a', 'b', 'c', 0xe7, 0x80, 0x80 ];
>     writeln(a);
> }
>
> Prints a Chinese character:
>
> abc瀀
>
> Ali

Hum... well, that's true for UTF-8 strings, if the _codeunit_ 
0xe7 appears, it is not 'ç'.

But when handling a 'char', there is no encoding, it "should" be 
raw _codepoint_.

I'm not really sure *if* these cases should be handle, nor how :/


More information about the Digitalmars-d-learn mailing list