why a part of D community do not want go to D2 ?
Daniel Gibson
metalcaedes at gmail.com
Thu Nov 11 16:00:18 PST 2010
Walter Bright schrieb:
> spir wrote:
>> In my views, there is a missing level of abstraction in common UString
>> processing libs and types. How to count the "â"s in a text? How to
>> find one? Above, indexOf fails because my editor uses a precombined
>> code, while the source (here literal) uses another form.
>> To be able to produce meaningful results, and to use simple routines
>> like index, find, count..., the way we used to with single-length
>> character sets, there should be a grouping phase on top of decoding;
>> we would then process arrays of "stacks" representing characters, not
>> of codes. ITo search, it's also necessary to have all characters
>> normalised form, so that both "â" would match: another phase.
>> Unicode provides algorithms for those phases in constructing string
>> representations -- but everyone seems to ignore the issues... s[0..1]
>> would then return the first character, not the first code of the
>> "stack" representing the first character.
>
>
>
> http://www.digitalmars.com/d/2.0/phobos/std_utf.html
If I'm not mistaken, those functions don't handle these "graphemes", i.e.
something that appears like one character on the screen, but consists of
multiple code *points*. Like spir's "â" that, in UTF-8, is encoded with the
following bytes: 0x61 (=='a'), 0xCC, 0x82. (Or \u0061\u0302 in UTF-32).
Also, a function returning the physical position (i.e. pos in arrray of chars or
wchars) of logical char #logPos may be useful, e.g. for fixed width printing stuff:
size_t getPhysPos(char[] str, size_t logPos)
Cheers,
- Daniel
More information about the Digitalmars-d
mailing list