Unicode handling comparison
Dmitry Olshansky
dmitry.olsh at gmail.com
Wed Nov 27 12:13:22 PST 2013
27-Nov-2013 22:12, H. S. Teoh пишет:
> On Wed, Nov 27, 2013 at 10:07:43AM -0800, Andrei Alexandrescu wrote:
>> On 11/27/13 7:43 AM, Jakob Ovrum wrote:
>>> On that note, I tried to use std.uni to write a simple example of how
>>> to correctly handle this in D, but it became apparent that std.uni
>>> should expose something like `byGrapheme` which lazily transforms a
>>> range of code points to a range of graphemes (probably needs a
>>> `byCodePoint` to do the converse too). The two extant grapheme
>>> functions, `decodeGrapheme` and `graphemeStride`, are *awful* for
>>> string manipulation (granted, they are probably perfect for text
>>> rendering).
>>
>> Yah, byGrapheme would be a great addition.
> [...]
>
> +1. This is better than the GraphemeString / i18nString proposal
> elsewhere in this thread, because it discourages people from using
> graphemes (poor performance) unless where actually necessary.
I could have sworn we had byGrapheme somewhere, well apparently not :(
BTW I believe that GraphemeString could still be a valuable addition. I
known of at least one good implementation that gives you O(1) grapheme
access with nice memory footprint numbers. It has many benefits but the
chief problem with it:
a) It doesn't at all solve the interchange at all - you'd have to encode
on write/re-code on read
b) It relies on having global shared state across the whole program, and
that's the real show-stopper thing about it
In any case it's a direction well worth exploring.
>
>
> T
>
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list