VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin michel.fortin at michelf.com
Sat Jan 15 12:55:48 PST 2011


On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer" 
<schveiguy at yahoo.com> said:

>> I'm not suggesting we impose it, just that we make it the default. If  
>> you want to iterate by dchar, wchar, or char, just write:
>> 
>> 	foreach (dchar c; "exposé") {}
>> 	foreach (wchar c; "exposé") {}
>> 	foreach (char c; "exposé") {}
>> 	// or
>> 	foreach (dchar c; "exposé".by!dchar()) {}
>> 	foreach (wchar c; "exposé".by!wchar()) {}
>> 	foreach (char c; "exposé".by!char()) {}
>> 
>> and it'll work. But the default would be a slice containing the  
>> grapheme, because this is the right way to represent a Unicode 
>> character.
> 
> I think this is a good idea.  I previously was nervous about it, but 
> I'm  not sure it makes a huge difference.  Returning a char[] is 
> certainly less  work than normalizing a grapheme into one or more code 
> points, and then  returning them.  All that it takes is to detect all 
> the code points within  the grapheme.  Normalization can be done if 
> needed, but would probably  have to output another char[], since a 
> normalized grapheme can occupy more  than one dchar.

I'm glad we agree on that now.


> What if I modified my proposed string_t type to return T[] as its 
> element  type, as you say, and string literals are typed as 
> string_t!(whatever)?   In addition, the restrictions I imposed on 
> slicing a code point actually  get imposed on slicing a grapheme.  That 
> is, it is illegal to substring a  string_t in a way that slices through 
> a grapheme (and by deduction, a code  point)?

I'm not opposed to that on principle. I'm a little uneasy about having 
so many types representing a string however. Some other raw comments:

I agree that things would be more coherent if char[], wchar[], and 
dchar[] behaved like other arrays, but I can't really see a 
justification for those types to be in the language if there's nothing 
special about them (why not a library type?). If strings and arrays of 
code units are distinct, slicing in the middle of a grapheme or in the 
middle of a code point could throw an error, but for performance 
reasons it should probably check for that only when array bounds 
checking is turned on (that would require compiler support however).


> Actually, we would need a grapheme to be its own type, because 
> comparing  two char[]'s that don't contain equivalent bits and having 
> them be equal,  violates the expectation that char[] is an array.
> 
> So the string_t!char would return a grapheme_t!char (names to be  
> discussed) as its element type.

Or you could make a grapheme a string_t. ;-)


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list