Go and generic programming on reddit, also touches on D
Steven Schveighoffer
schveiguy at yahoo.com
Mon Sep 19 08:52:03 PDT 2011
On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.gehr at gmx.ch> wrote:
> On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
>> On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr at gmx.ch>
>> wrote:
>>
>>> On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
>>>>
>>>> So I think it's not only limiting to require x.length to be $, it's
>>>> very
>>>> wrong in some cases.
>>>>
>>>> Also, think of a string. It has no length (well technically, it does,
>>>> but it's not the number of elements), but it has a distinct end
>>>> point. A
>>>> properly written string type would fail to compile if $ was s.length.
>>>>
>>>
>>> But you'd have to compute the length anyways in the general case:
>>>
>>> str[0..$/2];
>>>
>>> Or am I misunderstanding something?
>>>
>>
>> That's half the string in code units, not code points.
>>
>> If string was properly implemented, this would fail to compile. $ is not
>> the length of the string range (meaning the number of code points). The
>> given slice operation might actually create an invalid string.
>
> Programmers have to be aware of that if they want efficient code that
> deals with unicode. I think having random access to the code units and
> being able to iterate per code point is fine, because it gives you the
> best of both worlds. Manually decoding a string and slicing it at
> positions that were remembered to be safe has been good enough for me,
> at least it is efficient.
I find the same. I don't think I've ever dealt with arbitrary math
operations to do slices of strings like the above. I only slice a string
when I know the bounds are sane.
Like I said, it's a compromise. The "right" thing to do is probably not
even allow code-unit access via index (some have even argued that
code-point slicing is too dangerous, because you can split a grapheme,
leaving a valid, but incorrect slice of the original).
>> It's tricky, because you want fast slicing, but only certain slices are
>> valid. I once created a string type that used a char[] as its backing,
>> but actually implemented the limitations that std.range tries to enforce
>> (but cannot). It's somewhat of a compromise. If $ was mapped to
>> s.length, it would fail to compile, but I'm not sure what I *would* use
>> for $. It actually might be the code units, which would not make the
>> above line invalid.
>>
>> -Steve
>
> Well it would have to be consistent for a string type that "does it
> right" . Either the string is indexed with units or it is indexed with
> code points, and the other option should be provided. Dollar should just
> be the length of what is used for indexing/slicing here, and having that
> be different from length makes for a somewhat awkward interface imho.
Except we are defining a string as a *range* and a range's length is
defined as the number of elements.
Note that hasLength!string evaluates to false in std.range.'
$ should denote the end point of the aggregate, but it does not have to be
equivalent to length, or even an integer/uint. It should just mean "end".
I also proposed a while back to have ^ denote the beginning (similar to
regex) of an aggregate for aggregates that don't use 0 as the beginning,
but people didn't like it :)
-Steve
More information about the Digitalmars-d
mailing list