VLERange: a range in between BidirectionalRange and
foobar
foo at bar.com
Sat Jan 15 14:19:48 PST 2011
Steven Schveighoffer Wrote:
> On Sat, 15 Jan 2011 15:55:48 -0500, Michel Fortin
> <michel.fortin at michelf.com> wrote:
>
> > On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer"
> > <schveiguy at yahoo.com> said:
> >
> >>> I'm not suggesting we impose it, just that we make it the default. If
> >>> you want to iterate by dchar, wchar, or char, just write:
> >>> foreach (dchar c; "exposé") {}
> >>> foreach (wchar c; "exposé") {}
> >>> foreach (char c; "exposé") {}
> >>> // or
> >>> foreach (dchar c; "exposé".by!dchar()) {}
> >>> foreach (wchar c; "exposé".by!wchar()) {}
> >>> foreach (char c; "exposé".by!char()) {}
> >>> and it'll work. But the default would be a slice containing the
> >>> grapheme, because this is the right way to represent a Unicode
> >>> character.
> >> I think this is a good idea. I previously was nervous about it, but
> >> I'm not sure it makes a huge difference. Returning a char[] is
> >> certainly less work than normalizing a grapheme into one or more code
> >> points, and then returning them. All that it takes is to detect all
> >> the code points within the grapheme. Normalization can be done if
> >> needed, but would probably have to output another char[], since a
> >> normalized grapheme can occupy more than one dchar.
> >
> > I'm glad we agree on that now.
>
> It's a matter of me slowly wrapping my brain around unicode and how it's
> used. It seems like it's a typical committee defined standard where there
> are 10 ways to do everything, I was trying to weed out the lesser used (or
> so I perceived) pieces to allow a more implementable library. It's doubly
> hard for me since I have limited experience with other languages, and I've
> never tried to write them with a computer (my language classes in high
> school were back in the days of actually writing stuff down on paper).
>
> I once told a colleague who was on a standards committee that their
> proposed KLV standard (key length value) was ridiculous. The wise
> committee had decided that in order to avoid future issues, the length
> would be encoded as a single byte if < 128, or 128 + length of the length
> field for anything higher. This means you could potentially have to parse
> and process a 127-byte integer!
>
> >
> >
> >> What if I modified my proposed string_t type to return T[] as its
> >> element type, as you say, and string literals are typed as
> >> string_t!(whatever)? In addition, the restrictions I imposed on
> >> slicing a code point actually get imposed on slicing a grapheme. That
> >> is, it is illegal to substring a string_t in a way that slices through
> >> a grapheme (and by deduction, a code point)?
> >
> > I'm not opposed to that on principle. I'm a little uneasy about having
> > so many types representing a string however. Some other raw comments:
> >
> > I agree that things would be more coherent if char[], wchar[], and
> > dchar[] behaved like other arrays, but I can't really see a
> > justification for those types to be in the language if there's nothing
> > special about them (why not a library type?).
>
> I would not be opposed to getting rid of those types. But I am very
> opposed to char[] not being an array. If you want a string to be
> something other than an array, make it have a different syntax. We also
> have to consider C compatibility.
>
> However, we are in radical-change mode then, and this is probably pushed
> to D3 ;) If we can find some way to fix the situation without
> invalidating TDPL, we should strive for that first IMO.
>
> > If strings and arrays of code units are distinct, slicing in the middle
> > of a grapheme or in the middle of a code point could throw an error, but
> > for performance reasons it should probably check for that only when
> > array bounds checking is turned on (that would require compiler support
> > however).
>
> Not really, it could use assert, but that throws an assert error instead
> of a RangeError. Of course, both are errors and will abort the program.
> I do wish there was a version(noboundscheck) to do this kind of stuff
> with...
>
> >> Actually, we would need a grapheme to be its own type, because
> >> comparing two char[]'s that don't contain equivalent bits and having
> >> them be equal, violates the expectation that char[] is an array.
> >> So the string_t!char would return a grapheme_t!char (names to be
> >> discussed) as its element type.
> >
> > Or you could make a grapheme a string_t. ;-)
>
> I'm a little uneasy having a range return itself as its element type. For
> all intents and purposes, a grapheme is a string of one 'element', so it
> could potentially be a string_t.
>
> It does seem daunting to have so many types, but at the same time, types
> convey relationships at compile time that can make coding impossible to
> get wrong, or make things actually possible when having a single type
> doesn't.
>
> I'll give you an example from a previous life:
>
> Tango had a type called DateTime. This type represented *either* a point
> in time, or a span of time (depending on how you used it). But I proposed
> we switch to two distinct types, one for a point in time, one for a span
> of time. It was argued that both were so similar, why couldn't we just
> keep one type? The answer is simple -- having them be separate types
> allows me to express relationships that the compiler enforces. For
> example, you can add two time spans together, but you can't add two points
> in time together. Or maybe you want a function to accept a time span
> (like a sleep operation). If there was only one type, then
> sleep(DateTime.now()) compiles and sleeps for what, 2011 years? ;)
>
> I feel that making extra types when the relationship between them is
> important is worth the possible repetition of functionality. Catching
> bugs during compilation is soooo much better than experiencing them during
> runtime.
>
> -Steve
I like Michel's proposed semantics and I also agree with you that it should be a distinct string type and not break consistency of regular arrays.
Regarding your last point: Do you mean that a grapheme would be a sub-type of string? (a specialization where the string represents a single element)? If so, than it sounds good to me.
More information about the Digitalmars-d
mailing list