The Case Against Autodecode

deadalnix via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 14:25:50 PDT 2016


On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
> On 6/2/2016 12:34 PM, deadalnix wrote:
>> On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu 
>> wrote:
>>> Pretty much everything. Consider s and s1 string variables 
>>> with possibly
>>> different encodings (UTF8/UTF16).
>>>
>>> * s.all!(c => c == 'ö') works only with autodecoding. It 
>>> returns always false
>>> without.
>>>
>>
>> False. Many characters can be represented by different 
>> sequences of codepoints.
>> For instance, ê can be ê as one codepoint or ^ as a modifier 
>> followed by e. ö is
>> one such character.
>
> There are 3 levels of Unicode support. What Andrei is talking 
> about is Level 1.
>
> http://unicode.org/reports/tr18/tr18-5.1.html
>
> I wonder what rationale there is for Unicode to have two 
> different sequences of codepoints be treated as the same. It's 
> madness.

To be able to convert back and forth from/to unicode in a 
lossless manner.



More information about the Digitalmars-d mailing list