The Case Against Autodecode

tsbockman via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 13:36:01 PDT 2016


On Thursday, 2 June 2016 at 20:13:14 UTC, Andrei Alexandrescu 
wrote:
> On 06/02/2016 03:34 PM, tsbockman wrote:
>> Your 'ö' examples will NOT work reliably with auto-decoded 
>> code points,
>> and for nearly the same reason that they won't work with code 
>> units; you
>> would have to use byGrapheme.
>
> They do work per spec: find this code point. It would be 
> surprising if 'ö' were found but the string were positioned at 
> a different code point.

Your examples will pass or fail depending on how (and whether) 
the 'ö' grapheme is normalized. They only ever succeeds because 
'ö' happens to be one of the privileged graphemes that *can* be 
(but often isn't!) represented as a single code point. Many other 
graphemes have no such representation.

Working directly with code points is sometimes useful anyway - 
but then, working with code units can be, also. Neither will lead 
to inherently "correct" Unicode processing, and in the absence of 
a compelling context, your examples fall completely flat as an 
argument for the inherent superiority of processing at the code 
unit level.

>> The fact that you still don't get that, even after a dozen 
>> plus attempts
>> by the community to explain the difference, makes you unfit to 
>> direct
>> Phobos' Unicode support.
>
> Well there's gotta be a reason why my basic comprehension is 
> under constant scrutiny whereas yours is safe.

Who said mine is safe? I *know* that I'm not qualified to be in 
charge of this.

Your comprehension is under greater scrutiny because you are 
proposing to overrule nearly all other active contributors 
combined.

>> Please, either go study Unicode until you
>> really understand it, or delegate this issue to someone else.
>
> Would be happy to. To whom would I delegate?

If you're serious, I would suggest Dmitry Olshansky. He seems to 
be our top Unicode expert, based on his contributions to 
`std.uni` and `std.regex`. But, if he is unwilling/unsuitable for 
some reason there are other candidates participating in this 
thread (not me).


More information about the Digitalmars-d mailing list