Go and generic programming on reddit, also touches on D

Steven Schveighoffer schveiguy at yahoo.com
Mon Sep 19 08:52:03 PDT 2011


On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.gehr at gmx.ch> wrote:

> On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
>> On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr at gmx.ch>  
>> wrote:
>>
>>> On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
>>>>
>>>> So I think it's not only limiting to require x.length to be $, it's  
>>>> very
>>>> wrong in some cases.
>>>>
>>>> Also, think of a string. It has no length (well technically, it does,
>>>> but it's not the number of elements), but it has a distinct end  
>>>> point. A
>>>> properly written string type would fail to compile if $ was s.length.
>>>>
>>>
>>> But you'd have to compute the length anyways in the general case:
>>>
>>> str[0..$/2];
>>>
>>> Or am I misunderstanding something?
>>>
>>
>> That's half the string in code units, not code points.
>>
>> If string was properly implemented, this would fail to compile. $ is not
>> the length of the string range (meaning the number of code points). The
>> given slice operation might actually create an invalid string.
>
> Programmers have to be aware of that if they want efficient code that  
> deals with unicode. I think having random access to the code units and  
> being able to iterate per code point is fine, because it gives you the  
> best of both worlds. Manually decoding a string and slicing it at  
> positions that were remembered to be safe has been good enough for me,  
> at least it is efficient.

I find the same.  I don't think I've ever dealt with arbitrary math  
operations to do slices of strings like the above.  I only slice a string  
when I know the bounds are sane.

Like I said, it's a compromise.  The "right" thing to do is probably not  
even allow code-unit access via index (some have even argued that  
code-point slicing is too dangerous, because you can split a grapheme,  
leaving a valid, but incorrect slice of the original).

>> It's tricky, because you want fast slicing, but only certain slices are
>> valid. I once created a string type that used a char[] as its backing,
>> but actually implemented the limitations that std.range tries to enforce
>> (but cannot). It's somewhat of a compromise. If $ was mapped to
>> s.length, it would fail to compile, but I'm not sure what I *would* use
>> for $. It actually might be the code units, which would not make the
>> above line invalid.
>>
>> -Steve
>
> Well it would have to be consistent for a string type that "does it  
> right" . Either the string is indexed with units or it is indexed with  
> code points, and the other option should be provided. Dollar should just  
> be the length of what is used for indexing/slicing here, and having that  
> be different from length makes for a somewhat awkward interface imho.

Except we are defining a string as a *range* and a range's length is  
defined as the number of elements.

Note that hasLength!string evaluates to false in std.range.'

$ should denote the end point of the aggregate, but it does not have to be  
equivalent to length, or even an integer/uint.  It should just mean "end".

I also proposed a while back to have ^ denote the beginning (similar to  
regex) of an aggregate for aggregates that don't use 0 as the beginning,  
but people didn't like it :)

-Steve


More information about the Digitalmars-d mailing list