VLERange: a range in between BidirectionalRange and RandomAccessRange

foobar foo at bar.com
Sat Jan 15 23:11:14 PST 2011


Michel Fortin Wrote:


> Character literals are treated as simple numbers by the language. By 
> that I mean that you can write 'b' - 'a' == 1 and it'll be true. 
> Arithmetic makes absolutely no sense for graphemes. If you want a 
> special literal for graphemes, I'm afraid you'll have to invent 
> something new. And at this point, why not use a string?
> 
> 
> > Making a new character or grapheme type which represented a grapheme 
> > would be _far_ simpler to understand IMO. However, making it work 
> > really well would likely require that the compiler know about the 
> > grapheme type like it knows about dchar.
> 
> I'm looking for a simple solution. One that doesn't involve inventing a 
> new grapheme literal syntax or adding new types the compiler most know 
> about. I'm not really opposed to any of this, but the more complicated 
> is the solution, the less likely it is to be adopted.
> 
> All I'm asking is that Unicode strings behave as Unicode strings should 
> behave. Making iteration use graphemes by default and string comparison 
> use the normalized form by default seems like a simple way to achieve 
> that goal.
> 
> The most important is not the implementation, but that the default 
> behaviour be the right behaviour.
> 
> 
> -- 
> Michel Fortin
> michel.fortin at michelf.com
> http://michelf.com/
> 

I Understand your concern regarding a simpler implementation. You want to minimize the disruption caused by the proposed change. 

I'd argue that creating a specialized string type as Steve suggests makes integration *easier*. Your suggestion requires that foreach will be changed to default to grapheme. I agree that this can be done because it will not break silently but with Steve's string type this is unnecessary since the type itself would provide a grapheme range interface and the compiler doesn't need to know about this type at all. string becomes a regular library type. 

Of course, the type should support:
string foo = "bar"; 
by making an implicit conversion from current arrays (to minimize compiler changes)

The only disruption as far as I can tell would be using 'a' type literals instead of "a" but that will come up in compilation after string defaults to the new type. Also, all occurrences of:
string foo = ...;
foreach (c; foo) {...} // c is now a grapheme
will now do the correct thing by default.



More information about the Digitalmars-d mailing list