VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin michel.fortin at michelf.com
Sat Jan 15 20:45:53 PST 2011


On 2011-01-15 20:49:00 -0500, Jonathan M Davis <jmdavisProg at gmx.com> said:

> On Saturday 15 January 2011 04:24:33 Michel Fortin wrote:
>> I have my idea.
>> 
>> I think it'd be a good idea is to improve upon Andrei's first idea --
>> which was to treat char[], wchar[], and dchar[] all as ranges of dchar
>> elements -- by changing the element type to be the same as the string.
>> For instance, iterating on a char[] would give you slices of char[],
>> each having one grapheme.
>> 
>> The second component would be to make the string equality operator (=
> =)
>> for strings compare them in their normalized form, so that ("e" with
>> combining acute accent) == (pre-combined "é"). I think this would m
> ake
>> D support for Unicode much more intuitive.
>> 
>> This implies some semantic changes, mainly that everywhere you write a
>> "character" you must use double-quotes (string "a") instead of single
>> quote (code point 'a'), but from the user's point of view that's pretty
>> much all there is to change.
>> 
>> There'll still be plenty of room for specialized algorithms, but their
>> purpose would be limited to optimization. Correctness would be taken
>> care of by the basic range interface, and foreach should follow suit
>> and iterate by grapheme by default.
>> 
>> I wrote this example (or something similar) earlier in this thread:
>> 
>> 	foreach (grapheme; "exposé")
>> 		if (grapheme == "é")
>> 			break;
>> 
>> In this example, even if one of these two strings use the pre-combined
>> form of "é" and the other uses a combining acute accent, the equality
>> would still hold since foreach iterates on full graphemes and =
>> compares using normalization.
> 
> I think that that would cause definite problems. Having the element 
> type of the range be the same type as the range seems like it could 
> cause a lot of problems in std.algorithm and the like, and it's 
> _definitely_ going to confuse programmers. I'd expect it to be highly 
> bug-prone. They _need_ to be separate types.

I remember that someone already complained about this issue because he 
had a tree of ranges, and Andrei said he would take a look at this 
problem eventually. Perhaps now would be a good time.


> Now, given that dchar can't actually work completely as an element 
> type, you'd either need the string type to be a new type or the element 
> type to be a new type. So, either the string type has char[], wchar[], 
> or dchar[] for its element type, or char[], wchar[], and dchar[] have 
> something like uchar as their element type, where uchar is a struct 
> which contains a char[], wchar[], or dchar[]
> which holds a single grapheme.

Having a new type for grapheme would work too. My preference still goes 
to reusing the string type because it makes the semantic simpler to 
understand, especially when comparing graphemes with literals.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list