VLERange: a range in between BidirectionalRange and RandomAccessRange
foobar
foo at bar.com
Sat Jan 15 23:11:14 PST 2011
Michel Fortin Wrote:
> Character literals are treated as simple numbers by the language. By
> that I mean that you can write 'b' - 'a' == 1 and it'll be true.
> Arithmetic makes absolutely no sense for graphemes. If you want a
> special literal for graphemes, I'm afraid you'll have to invent
> something new. And at this point, why not use a string?
>
>
> > Making a new character or grapheme type which represented a grapheme
> > would be _far_ simpler to understand IMO. However, making it work
> > really well would likely require that the compiler know about the
> > grapheme type like it knows about dchar.
>
> I'm looking for a simple solution. One that doesn't involve inventing a
> new grapheme literal syntax or adding new types the compiler most know
> about. I'm not really opposed to any of this, but the more complicated
> is the solution, the less likely it is to be adopted.
>
> All I'm asking is that Unicode strings behave as Unicode strings should
> behave. Making iteration use graphemes by default and string comparison
> use the normalized form by default seems like a simple way to achieve
> that goal.
>
> The most important is not the implementation, but that the default
> behaviour be the right behaviour.
>
>
> --
> Michel Fortin
> michel.fortin at michelf.com
> http://michelf.com/
>
I Understand your concern regarding a simpler implementation. You want to minimize the disruption caused by the proposed change.
I'd argue that creating a specialized string type as Steve suggests makes integration *easier*. Your suggestion requires that foreach will be changed to default to grapheme. I agree that this can be done because it will not break silently but with Steve's string type this is unnecessary since the type itself would provide a grapheme range interface and the compiler doesn't need to know about this type at all. string becomes a regular library type.
Of course, the type should support:
string foo = "bar";
by making an implicit conversion from current arrays (to minimize compiler changes)
The only disruption as far as I can tell would be using 'a' type literals instead of "a" but that will come up in compilation after string defaults to the new type. Also, all occurrences of:
string foo = ...;
foreach (c; foo) {...} // c is now a grapheme
will now do the correct thing by default.
More information about the Digitalmars-d
mailing list