VLERange: a range in between BidirectionalRange and RandomAccessRange
Michel Fortin
michel.fortin at michelf.com
Sat Jan 15 12:55:48 PST 2011
On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer"
<schveiguy at yahoo.com> said:
>> I'm not suggesting we impose it, just that we make it the default. If
>> you want to iterate by dchar, wchar, or char, just write:
>>
>> foreach (dchar c; "exposé") {}
>> foreach (wchar c; "exposé") {}
>> foreach (char c; "exposé") {}
>> // or
>> foreach (dchar c; "exposé".by!dchar()) {}
>> foreach (wchar c; "exposé".by!wchar()) {}
>> foreach (char c; "exposé".by!char()) {}
>>
>> and it'll work. But the default would be a slice containing the
>> grapheme, because this is the right way to represent a Unicode
>> character.
>
> I think this is a good idea. I previously was nervous about it, but
> I'm not sure it makes a huge difference. Returning a char[] is
> certainly less work than normalizing a grapheme into one or more code
> points, and then returning them. All that it takes is to detect all
> the code points within the grapheme. Normalization can be done if
> needed, but would probably have to output another char[], since a
> normalized grapheme can occupy more than one dchar.
I'm glad we agree on that now.
> What if I modified my proposed string_t type to return T[] as its
> element type, as you say, and string literals are typed as
> string_t!(whatever)? In addition, the restrictions I imposed on
> slicing a code point actually get imposed on slicing a grapheme. That
> is, it is illegal to substring a string_t in a way that slices through
> a grapheme (and by deduction, a code point)?
I'm not opposed to that on principle. I'm a little uneasy about having
so many types representing a string however. Some other raw comments:
I agree that things would be more coherent if char[], wchar[], and
dchar[] behaved like other arrays, but I can't really see a
justification for those types to be in the language if there's nothing
special about them (why not a library type?). If strings and arrays of
code units are distinct, slicing in the middle of a grapheme or in the
middle of a code point could throw an error, but for performance
reasons it should probably check for that only when array bounds
checking is turned on (that would require compiler support however).
> Actually, we would need a grapheme to be its own type, because
> comparing two char[]'s that don't contain equivalent bits and having
> them be equal, violates the expectation that char[] is an array.
>
> So the string_t!char would return a grapheme_t!char (names to be
> discussed) as its element type.
Or you could make a grapheme a string_t. ;-)
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list