VLERange: a range in between BidirectionalRange and RandomAccessRange
Michel Fortin
michel.fortin at michelf.com
Sat Jan 15 20:45:01 PST 2011
On 2011-01-15 18:59:27 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
> I'm unclear on where this is converging to. At this point the
> commitment of the language and its standard library to (a) UTF aray
> representation and (b) code points conceptualization is quite strong.
> Changing that would be quite difficult and disruptive, and the benefits
> are virtually nonexistent for most of D's user base.
There's still a disagreement about whether a string or a code unit
array should be the default string representation, and whether
iterating on a code unit array should give you code unit or grapheme
elements. Of those who who participated in the discussion, I don't
think anyone is disputing the idea that a grapheme element is better
than a dchar element for iterating over a string.
> It may be more realistic to consider using what we have as back-end for
> grapheme-oriented processing.
> For example:
>
> struct Grapheme(Char) if (isSomeChar!Char)
> {
> private const Char[] rep;
> ...
> }
>
> auto byGrapheme(S)(S s) if (isSomeString!S)
> {
> ...
> }
>
> string s = "Hello";
> foreach (g; byGrapheme(s)
> {
> ...
> }
No doubt it's easier to implement it that way. The problem is that in
most cases it won't be used. How many people really know what is a
grapheme? Of those, how many will forget to use byGrapheme at one time
or another? And so in most programs string manipulation will misbehave
in the presence of combining characters or unnormalized strings.
If you want to help D programmers write correct code when it comes to
Unicode manipulation, you need to help them iterate on real characters
(graphemes), and you need the algorithms to apply to real characters
(graphemes), not the approximation of a Unicode character that is a
code point.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list