The Case Against Autodecode

tsbockman via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 23:09:37 PDT 2016


On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote:
> However, this document is very old - from Unicode 3.0 and the 
> year 2000:
>
>> While there are no surrogate characters in Unicode 3.0 
>> (outside of private use characters), future versions of 
>> Unicode will contain them...
>
> Perhaps level 1 has since been redefined?

I found the latest (unofficial) draft version:
     http://www.unicode.org/reports/tr18/tr18-18.html

Relevant changes:

* Level 1 is to be redefined as working on code points, not code 
units:

> A fundamental requirement is that Unicode text be interpreted 
> semantically by code point, not code units.

* Level 2 (graphemes) is explicitly described as a "default 
level":

> This is still a default level—independent of country or 
> language—but provides much better support for end-user 
> expectations than the raw level 1...

* All mention of level 2 being slow has been removed. The only 
reason given for making it toggle-able is for compatibility with 
level 1 algorithms:

> Level 2 support matches much more what user expectations are 
> for sequences of Unicode characters. It is still 
> locale-independent and easily implementable. However, for 
> compatibility with Level 1, it is useful to have some sort of 
> syntax that will turn Level 2 support on and off.



More information about the Digitalmars-d mailing list