The Case Against Autodecode

Walter Bright via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 13:27:27 PDT 2016


On 6/2/2016 12:34 PM, deadalnix wrote:
> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>> Pretty much everything. Consider s and s1 string variables with possibly
>> different encodings (UTF8/UTF16).
>>
>> * s.all!(c => c == 'ö') works only with autodecoding. It returns always false
>> without.
>>
>
> False. Many characters can be represented by different sequences of codepoints.
> For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is
> one such character.

There are 3 levels of Unicode support. What Andrei is talking about is Level 1.

http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two different sequences of 
codepoints be treated as the same. It's madness.


More information about the Digitalmars-d mailing list