why a part of D community do not want go to D2 ?

Daniel Gibson metalcaedes at gmail.com
Thu Nov 11 16:00:18 PST 2010


Walter Bright schrieb:
> spir wrote:
>> In my views, there is a missing level of abstraction in common UString 
>> processing libs and types. How to count the "â"s in a text? How to 
>> find one? Above, indexOf fails because my editor uses a precombined 
>> code, while the source (here literal) uses another form.
>> To be able to produce meaningful results, and to use simple routines 
>> like index, find, count..., the way we used to with single-length 
>> character sets, there should be a grouping phase on top of decoding; 
>> we would then process arrays of "stacks" representing characters, not 
>> of codes. ITo search, it's also necessary to have all characters 
>> normalised form, so that both "â" would match: another phase.
>> Unicode provides algorithms for those phases in constructing string 
>> representations -- but everyone seems to ignore the issues... s[0..1] 
>> would then return the first character, not the first code of the 
>> "stack" representing the first character.
> 
> 
> 
> http://www.digitalmars.com/d/2.0/phobos/std_utf.html

If I'm not mistaken, those functions don't handle these "graphemes", i.e. 
something that appears like one character on the screen, but consists of 
multiple code *points*. Like spir's "â" that, in UTF-8, is encoded with the 
following bytes: 0x61 (=='a'), 0xCC, 0x82. (Or \u0061\u0302 in UTF-32).

Also, a function returning the physical position (i.e. pos in arrray of chars or 
wchars) of logical char #logPos may be useful, e.g. for fixed width printing stuff:
   size_t getPhysPos(char[] str, size_t logPos)

Cheers,
- Daniel


More information about the Digitalmars-d mailing list