VLERange: a range in between BidirectionalRange and RandomAccessRange

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Jan 14 14:04:08 PST 2011


On 1/14/11 7:50 AM, Michel Fortin wrote:
> On 2011-01-13 23:23:10 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> On 1/13/11 7:09 PM, Michel Fortin wrote:
>>> That's forgetting that most of the time people care about graphemes
>>> (user-perceived characters), not code points.
>>
>> I'm not so sure about that. What do you base this assessment on? Denis
>> wrote a library that according to him does grapheme-related stuff
>> nobody else does. So apparently graphemes is not what people care
>> about (although it might be what they should care about).
>
> Apple implemented all these things in the NSString class in Cocoa. They
> did all this work on Unicode at the beginning of Mac OS X, at a time
> where making such changes wouldn't break anything.
>
> It's a hard thing to change later when you have code that depend on the
> old behaviour. It's a complicated matter and not so many people will
> understand the issues, so it's no wonder many languages just deal with
> code points.

That's a strong indicator, but we shouldn't get ahead of ourselves.

D took a certain risk by defaulting to Unicode at a time where the 
dominant extant systems languages left the decision to more or less 
exotic libraries, Java used UTF16 de jure but UCS2 de facto, and other 
languages were just starting to adopt Unicode.

I think that risk was justified because the relative loss in speed was 
often acceptable and the gains were there. Even so, there are people in 
this who protest against the loss in efficiency and argue that life is 
harder for ASCII users.

Switching to variable-length representation of graphemes as bundles of 
dchars and committing to that through and through will bring with it a 
larger hit in efficiency and an increased difficulty in usage. I agree 
that at a level that's the "right" thing to do, but I don't have yet the 
feeling that combining characters are a widely-adopted winner. For the 
most part, fonts don't support combining characters, and as a font 
dilettante I can tell that putting arbitrary sets of diacritics on top 
of characters is not what one should be doing as it'll look terrible. 
Unicode is begrudgingly acknowledging combining characters. Only a 
handful of libraries deal with them. I don't know how many applications 
need or care for them, versus how many applications do fine with 
precombined characters. I have trouble getting combining characters to 
combine on this machine in any of the applications I use - and this is a 
Mac.


Andrei


More information about the Digitalmars-d mailing list