The Case Against Autodecode

Fri Jun 3 12:12:03 PDT 2016

On 02-Jun-2016 23:27, Walter Bright wrote:
> On 6/2/2016 12:34 PM, deadalnix wrote:
>> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
>>> Pretty much everything. Consider s and s1 string variables with possibly
>>> different encodings (UTF8/UTF16).
>>>
>>> * s.all!(c => c == 'ö') works only with autodecoding. It returns
>>> always false
>>> without.
>>>
>>
>> False. Many characters can be represented by different sequences of
>> codepoints.
>> For instance, ê can be ê as one codepoint or ^ as a modifier followed
>> by e. ö is
>> one such character.
>
> There are 3 levels of Unicode support. What Andrei is talking about is
> Level 1.
>
> http://unicode.org/reports/tr18/tr18-5.1.html
>
> I wonder what rationale there is for Unicode to have two different
> sequences of codepoints be treated as the same. It's madness.

Yeah, Unicode was not meant to be easy it seems. Or this is whatever 
happens with evolutionary design that started with "everything is a 
16-bit character".

-- 
Dmitry Olshansky