std.string will get the boot
Michel Fortin
michel.fortin at michelf.com
Sat Jan 30 19:31:42 PST 2010
On 2010-01-30 22:06:06 -0500, Lionello Lunesu <lio at lunesu.remove.com> said:
> On 30-1-2010 1:59, Andrei Alexandrescu wrote:
>> bearophile wrote:
>>> Andrei Alexandrescu:
>>>> Currently arrays of characters count as random-access ranges, which
>>>> is not true for arrays of char and wchar. I plan to make std.range
>>>> aware of that and only characterize char[] and wchar[] (and their
>>>> qualified versions) as bidirectional ranges.
>>>
>>> 32 bits are not enough to represent certain "characters", they need
>>> more than one of such dchar. So dchar too may be a bidirectional range.
>>
>> [citation needed]
>
> I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF
> as the highest code point.
32-bit is enough to cover all code points. But there are many combining
code points in Unicode, allowing you to combine diacritic with various
other characters, such as an acute accent with a 'k'. Some of these
combinations exists in precombined form and are considered equivalent.
So if you want to count the number of characters the user actually see
instead of counting code points, then you need to take these combining
code points into account.
But if you really wanted to iterate over "characters" instead of code
points, note that it can become quite hard if you take into account
double diacritics, combining diacritic signs placed across two letters.
So I think it's reasonable to have dchar, a code point, as the base
unit for iterating over a string.
http://en.wikipedia.org/wiki/Combining_character
http://en.wikipedia.org/wiki/Unicode_normalization
Another interesting case:
http://en.wikipedia.org/wiki/Combining_grapheme_joiner
Unicode, isn't it great?
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list