Go and generic programming on reddit, also touches on D

Mon Sep 19 09:00:46 PDT 2011

On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:
> On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.gehr at gmx.ch> wrote:
>
>> On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
>>> On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr at gmx.ch>
>>> wrote:
>>>
>>>> On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
>>>>>
>>>>> So I think it's not only limiting to require x.length to be $, it's
>>>>> very
>>>>> wrong in some cases.
>>>>>
>>>>> Also, think of a string. It has no length (well technically, it does,
>>>>> but it's not the number of elements), but it has a distinct end
>>>>> point. A
>>>>> properly written string type would fail to compile if $ was s.length.
>>>>>
>>>>
>>>> But you'd have to compute the length anyways in the general case:
>>>>
>>>> str[0..$/2];
>>>>
>>>> Or am I misunderstanding something?
>>>>
>>>
>>> That's half the string in code units, not code points.
>>>
>>> If string was properly implemented, this would fail to compile. $ is not
>>> the length of the string range (meaning the number of code points). The
>>> given slice operation might actually create an invalid string.
>>
>> Programmers have to be aware of that if they want efficient code that
>> deals with unicode. I think having random access to the code units and
>> being able to iterate per code point is fine, because it gives you the
>> best of both worlds. Manually decoding a string and slicing it at
>> positions that were remembered to be safe has been good enough for me,
>> at least it is efficient.
>
> I find the same. I don't think I've ever dealt with arbitrary math
> operations to do slices of strings like the above. I only slice a string
> when I know the bounds are sane.
>
> Like I said, it's a compromise. The "right" thing to do is probably not
> even allow code-unit access via index (some have even argued that
> code-point slicing is too dangerous, because you can split a grapheme,
> leaving a valid, but incorrect slice of the original).
>
>>> It's tricky, because you want fast slicing, but only certain slices are
>>> valid. I once created a string type that used a char[] as its backing,
>>> but actually implemented the limitations that std.range tries to enforce
>>> (but cannot). It's somewhat of a compromise. If $ was mapped to
>>> s.length, it would fail to compile, but I'm not sure what I *would* use
>>> for $. It actually might be the code units, which would not make the
>>> above line invalid.
>>>
>>> -Steve
>>
>> Well it would have to be consistent for a string type that "does it
>> right" . Either the string is indexed with units or it is indexed with
>> code points, and the other option should be provided. Dollar should
>> just be the length of what is used for indexing/slicing here, and
>> having that be different from length makes for a somewhat awkward
>> interface imho.
>
> Except we are defining a string as a *range* and a range's length is
> defined as the number of elements.
>
> Note that hasLength!string evaluates to false in std.range.'

Ok. I feel the way narrow strings are handled in Phobos are a reasonable 
trade-off.

>
> $ should denote the end point of the aggregate, but it does not have to
> be equivalent to length, or even an integer/uint. It should just mean
> "end".

Point taken. What is the solution for infinite ranges? Should any 
arithmetics on $ just be disallowed?

>
> I also proposed a while back to have ^ denote the beginning (similar to
> regex) of an aggregate for aggregates that don't use 0 as the beginning,
> but people didn't like it :)
>
> -Steve

=D, well, it is grammatically unambiguous!