Unicode handling comparison

Wed Nov 27 12:13:22 PST 2013

27-Nov-2013 22:12, H. S. Teoh пишет:
> On Wed, Nov 27, 2013 at 10:07:43AM -0800, Andrei Alexandrescu wrote:
>> On 11/27/13 7:43 AM, Jakob Ovrum wrote:
>>> On that note, I tried to use std.uni to write a simple example of how
>>> to correctly handle this in D, but it became apparent that std.uni
>>> should expose something like `byGrapheme` which lazily transforms a
>>> range of code points to a range of graphemes (probably needs a
>>> `byCodePoint` to do the converse too). The two extant grapheme
>>> functions, `decodeGrapheme` and `graphemeStride`, are *awful* for
>>> string manipulation (granted, they are probably perfect for text
>>> rendering).
>>
>> Yah, byGrapheme would be a great addition.
> [...]
>
> +1. This is better than the GraphemeString / i18nString proposal
> elsewhere in this thread, because it discourages people from using
> graphemes (poor performance) unless where actually necessary.

I could have sworn we had byGrapheme somewhere, well apparently not :(

BTW I believe that GraphemeString could still be a valuable addition. I 
known of at least one good implementation that gives you O(1) grapheme 
access with nice memory footprint numbers. It has many benefits but the 
chief problem with it:
a) It doesn't at all solve the interchange at all - you'd have to encode 
on  write/re-code on read
b) It relies on having global shared state across the whole program, and 
that's the real show-stopper thing about it

In any case it's a direction well worth exploring.
>
>
> T
>

-- 
Dmitry Olshansky