VLERange: a range in between BidirectionalRange and RandomAccessRange
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Jan 14 14:04:08 PST 2011
On 1/14/11 7:50 AM, Michel Fortin wrote:
> On 2011-01-13 23:23:10 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> On 1/13/11 7:09 PM, Michel Fortin wrote:
>>> That's forgetting that most of the time people care about graphemes
>>> (user-perceived characters), not code points.
>>
>> I'm not so sure about that. What do you base this assessment on? Denis
>> wrote a library that according to him does grapheme-related stuff
>> nobody else does. So apparently graphemes is not what people care
>> about (although it might be what they should care about).
>
> Apple implemented all these things in the NSString class in Cocoa. They
> did all this work on Unicode at the beginning of Mac OS X, at a time
> where making such changes wouldn't break anything.
>
> It's a hard thing to change later when you have code that depend on the
> old behaviour. It's a complicated matter and not so many people will
> understand the issues, so it's no wonder many languages just deal with
> code points.
That's a strong indicator, but we shouldn't get ahead of ourselves.
D took a certain risk by defaulting to Unicode at a time where the
dominant extant systems languages left the decision to more or less
exotic libraries, Java used UTF16 de jure but UCS2 de facto, and other
languages were just starting to adopt Unicode.
I think that risk was justified because the relative loss in speed was
often acceptable and the gains were there. Even so, there are people in
this who protest against the loss in efficiency and argue that life is
harder for ASCII users.
Switching to variable-length representation of graphemes as bundles of
dchars and committing to that through and through will bring with it a
larger hit in efficiency and an increased difficulty in usage. I agree
that at a level that's the "right" thing to do, but I don't have yet the
feeling that combining characters are a widely-adopted winner. For the
most part, fonts don't support combining characters, and as a font
dilettante I can tell that putting arbitrary sets of diacritics on top
of characters is not what one should be doing as it'll look terrible.
Unicode is begrudgingly acknowledging combining characters. Only a
handful of libraries deal with them. I don't know how many applications
need or care for them, versus how many applications do fine with
precombined characters. I have trouble getting combining characters to
combine on this machine in any of the applications I use - and this is a
Mac.
Andrei
More information about the Digitalmars-d
mailing list