The Case Against Autodecode
tsbockman via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 23:09:37 PDT 2016
On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote:
> However, this document is very old - from Unicode 3.0 and the
> year 2000:
>
>> While there are no surrogate characters in Unicode 3.0
>> (outside of private use characters), future versions of
>> Unicode will contain them...
>
> Perhaps level 1 has since been redefined?
I found the latest (unofficial) draft version:
http://www.unicode.org/reports/tr18/tr18-18.html
Relevant changes:
* Level 1 is to be redefined as working on code points, not code
units:
> A fundamental requirement is that Unicode text be interpreted
> semantically by code point, not code units.
* Level 2 (graphemes) is explicitly described as a "default
level":
> This is still a default level—independent of country or
> language—but provides much better support for end-user
> expectations than the raw level 1...
* All mention of level 2 being slow has been removed. The only
reason given for making it toggle-able is for compatibility with
level 1 algorithms:
> Level 2 support matches much more what user expectations are
> for sequences of Unicode characters. It is still
> locale-independent and easily implementable. However, for
> compatibility with Level 1, it is useful to have some sort of
> syntax that will turn Level 2 support on and off.
More information about the Digitalmars-d
mailing list