What is the legal range of chars?

Ali Çehreli acehreli at yahoo.com
Wed Jun 19 08:13:22 PDT 2013


On 06/19/2013 05:34 AM, monarch_dodra wrote:

 > I know a "binary" char can hold the values 0 to 0xFF. However, I'm
 > wondering about the cases where a codepoint can fit inside a char. For
 > example, 'ç' is represented by 0xe7, which technically fits inside a 
char.

'ç' is represented by 0xe7 in an encoding that is not UTF-8. :)

That would be a special agreement between the producer and the consumer 
of that string. Otherwise, 0xe7 is not 'ç'. I recommend ubyte[] for 
those cases.

In UTF-8, 0xe7 is the first byte of a 3-byte code point:

import std.stdio;

void main()
{
     char[] a = [ 'a', 'b', 'c', 0xe7, 0x80, 0x80 ];
     writeln(a);
}

Prints a Chinese character:

abc瀀

Ali



More information about the Digitalmars-d-learn mailing list