Questions about the slice operator

Wed Apr 4 06:01:05 PDT 2012

On Wed, 04 Apr 2012 14:16:54 +0200, Simen Kjærås <simen.kjaras at gmail.com>  
wrote:

> On Wed, 04 Apr 2012 12:06:33 +0200, Jacob Carlborg <doob at me.com> wrote:
>
>> On 2012-04-04 04:11, Jonathan M Davis wrote:
>>
>>> foreach(i; 0 .. 5)
>>>
>>> is more efficient only because it has _nothing_ to do with arrays.  
>>> Generalizing
>>> the syntax wouldn't help at all, and if it were generalized, it would  
>>> arguably
>>> have to be consistent in all of its uses, in which case
>>>
>>> foreach(i; 0 .. 5)
>>>
>>> would become identical to
>>>
>>> foreach(i; [0, 1, 2, 3, 4])
>>>
>>> and therefore less efficient. Generalizing .. just doesn't make sense.
>>
>> Why couldn't the .. syntax be syntax sugar for some kind of library  
>> implement range type, just as what is done with associative arrays.
>>
>> We could implement a new library type, named "range". Looking something  
>> like this:
>>
>> struct range
>> {
>>      size_t start;
>>      size_t end;
>>      // implement the range interface or opApply
>> }
>>
>> range r = 1 .. 5;
>>
>> The above line would be syntax sugar for:
>>
>> range r = range(1, 5);
>>
>> void foo (range r)
>> {
>>      foreach (e ; r) {}
>> }
>>
>> foo(r);
>>
>> This could then be taken advantage of in other parts of the language:
>>
>> class A
>> {
>>      int opSlice (range r); // new syntax
>>      int opSlice (size_t start, size_t end); // old syntax
>> }
>>
>> I think this would be completely backwards compatible as well.
>>
>
> And what do we do with 3..$?

Actually, I've thought a little about this. And apart from the tiny
idiosyncrasy of $, a..b as a more regular type can bring some
interesting enhancements to the language.

Consider a..b as simply a set of indices, defined by a start point and
an end point. A different index set may be [1,2,4,5], or Strided!(3,4).

An index set then works as a filter on a range, returning only those
elements whose indices are in the set.

We can now redefine opIndex to take either a single index or an index
set, as follows:

auto opIndex(S)(S set) if (isIndexSet!S) {
     return set.transform(this);
}

For an AA, there would be another constraint that the type of elements
of the index set match those of the AA keys, of course. Other containers
may have other constraints.

An index set may or may not be iterable, but it should always supply
functionality to check if an index is contained in it.

With this framework laid out, we can define these operations on arrays,
and have any array be sliceable by an array of integral elements:

assert(['a','b','c'][[0,2]] == ['a', 'c']);

The problem of $ is a separate one, and quite complex to handle. No
doubt it is useful for arrays and their ilk, but for the generic array
and index set, it's complex and unpleasant.

Barring the use of expression templates, I see few other solutions than
to introduce the function opDollar(size_t level), where level is 0 for
the first index ([$]), 1 for the second ([_, $]), etc. This means there
is no way to express the concept of next-to-last element outside of the
opSlice call.

A different solution would be to use a specific type for $. Basically,
this would be:

struct Dollar(T) {
     T offset;
     alias offset this;
     // operator overloads here to assure typeof($+n) == typeof($)
}

This complicates things a lot, and still does not really work.
[1,2,3][0..foo($)] works in D today, but would not with the proposed
type. Hence, the use of $ outside slice operations likely should not
(indeed, can not) be possible.