Unicode handling comparison

Wed Nov 27 12:06:32 PST 2013

27-Nov-2013 18:45, David Nadlinger пишет:
> On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
>> Through Reddit I have seen this small comparison of Unicode handling
>> between different programming languages:
>>
>> http://mortoray.com/2013/11/27/the-string-type-is-broken/
>>
>> D+Phobos seem to fail most things (it produces BAFFLE):
>> http://dpaste.dzfl.pl/a5268c435
>
> If you need to perform this kind of operations on Unicode strings in D,
> you can call normalize (std.uni) on the string first to make sure it is
> in one of the Normalization Forms. For example, just appending
> .normalize to your strings (which defaults to NFC) would make the code
> produce the "expected" results.
>
> As far as I'm aware, this behavior is the result of a deliberate
> decision, as normalizing strings on the fly isn't really cheap.

It's anything but cheap.
At the minimum imagine crawling the string and issuing a table lookup 
per codepoint.

>
> David

-- 
Dmitry Olshansky