VLERange: a range in between BidirectionalRange and RandomAccessRange
Michel Fortin
michel.fortin at michelf.com
Sat Jan 15 20:45:53 PST 2011
On 2011-01-15 20:49:00 -0500, Jonathan M Davis <jmdavisProg at gmx.com> said:
> On Saturday 15 January 2011 04:24:33 Michel Fortin wrote:
>> I have my idea.
>>
>> I think it'd be a good idea is to improve upon Andrei's first idea --
>> which was to treat char[], wchar[], and dchar[] all as ranges of dchar
>> elements -- by changing the element type to be the same as the string.
>> For instance, iterating on a char[] would give you slices of char[],
>> each having one grapheme.
>>
>> The second component would be to make the string equality operator (=
> =)
>> for strings compare them in their normalized form, so that ("e" with
>> combining acute accent) == (pre-combined "é"). I think this would m
> ake
>> D support for Unicode much more intuitive.
>>
>> This implies some semantic changes, mainly that everywhere you write a
>> "character" you must use double-quotes (string "a") instead of single
>> quote (code point 'a'), but from the user's point of view that's pretty
>> much all there is to change.
>>
>> There'll still be plenty of room for specialized algorithms, but their
>> purpose would be limited to optimization. Correctness would be taken
>> care of by the basic range interface, and foreach should follow suit
>> and iterate by grapheme by default.
>>
>> I wrote this example (or something similar) earlier in this thread:
>>
>> foreach (grapheme; "exposé")
>> if (grapheme == "é")
>> break;
>>
>> In this example, even if one of these two strings use the pre-combined
>> form of "é" and the other uses a combining acute accent, the equality
>> would still hold since foreach iterates on full graphemes and =
>> compares using normalization.
>
> I think that that would cause definite problems. Having the element
> type of the range be the same type as the range seems like it could
> cause a lot of problems in std.algorithm and the like, and it's
> _definitely_ going to confuse programmers. I'd expect it to be highly
> bug-prone. They _need_ to be separate types.
I remember that someone already complained about this issue because he
had a tree of ranges, and Andrei said he would take a look at this
problem eventually. Perhaps now would be a good time.
> Now, given that dchar can't actually work completely as an element
> type, you'd either need the string type to be a new type or the element
> type to be a new type. So, either the string type has char[], wchar[],
> or dchar[] for its element type, or char[], wchar[], and dchar[] have
> something like uchar as their element type, where uchar is a struct
> which contains a char[], wchar[], or dchar[]
> which holds a single grapheme.
Having a new type for grapheme would work too. My preference still goes
to reusing the string type because it makes the semantic simpler to
understand, especially when comparing graphemes with literals.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list