What is the legal range of chars?
monarch_dodra
monarchdodra at gmail.com
Wed Jun 19 09:53:59 PDT 2013
On Wednesday, 19 June 2013 at 15:13:23 UTC, Ali Çehreli wrote:
> On 06/19/2013 05:34 AM, monarch_dodra wrote:
>
> > I know a "binary" char can hold the values 0 to 0xFF.
> However, I'm
> > wondering about the cases where a codepoint can fit inside a
> char. For
> > example, 'ç' is represented by 0xe7, which technically fits
> inside a char.
>
> 'ç' is represented by 0xe7 in an encoding that is not UTF-8. :)
>
> That would be a special agreement between the producer and the
> consumer of that string. Otherwise, 0xe7 is not 'ç'. I
> recommend ubyte[] for those cases.
>
> In UTF-8, 0xe7 is the first byte of a 3-byte code point:
>
> import std.stdio;
>
> void main()
> {
> char[] a = [ 'a', 'b', 'c', 0xe7, 0x80, 0x80 ];
> writeln(a);
> }
>
> Prints a Chinese character:
>
> abc瀀
>
> Ali
Hum... well, that's true for UTF-8 strings, if the _codeunit_
0xe7 appears, it is not 'ç'.
But when handling a 'char', there is no encoding, it "should" be
raw _codepoint_.
I'm not really sure *if* these cases should be handle, nor how :/
More information about the Digitalmars-d-learn
mailing list