RFC on range design for D2

Mon Sep 8 22:24:50 PDT 2008

On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Robert Jacques wrote:
>> On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>> Denis Koroskin wrote:
>>>> 3) Walter mentioned that built-in array could be re-implemented using  
>>>> a pair of pointers instead of ptr+length. Will it ever get a green  
>>>> light? It fits range concept much better.
>>>
>>> Walter told me to first implement my design, and if it works, he'll do  
>>> the change. Yes, it does fit ranges much better because the often-used  
>>> next and, um, pop will only touch one word instead of two.
>>  I'd warn that changing away from ptr+length would create logical  
>> incosistencies between 1D arrays and 2D/3D/ND arrays.
>
> How so?

An ND array is typically defined as a fat pointer like so:
struct array(T,size_t N) {
     T* ptr;
     size_t[N]    lengths;      // of each dimension
     ptrdiff_t[N] byte_strides; // of each dimension
}
So a 1D array is
{
     T* ptr;
     size_t lengths;
     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time constant  
in the built-in array
     size_t length() { return lengths; }
}
which is logically consistent with a general dense matrix and aside from  
some name change and the stride being a compile time constant, is  
identical to the current D arrays.
However, { T* first; T* last } may not be logically extended to ND arrays,  
particularly sliced ND arrays, as T* last no longer has any meaning.

>>>> 4) We need some way of supporting dollar notation in user containers.  
>>>> The hack of using __dollar is bad (although it works).
>>>
>>> It doesn't work for multiple dimensions. There should be an  
>>> opDollar(uint dim) that gives the library information on which  
>>> argument count it occured in. Consider:
>>>
>>> auto x = matrix[$-1, $-1];
>>>
>>> Here the dollar's occurrences have different meanings. A good start  
>>> would be to expand the above into:
>>>
>>> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
>>  I'd also add that multiple dimension slicing should be supported. i.e.
>> auto x = matrix[2..5,0..$,3]
>> would become
>> auto x =  
>> matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
>> with
>> struct Slice (T) { T start; T end; }
>> Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the  
>> array
>
> Multidimensional slicing can be implemented with staggered indexing:
>
> matrix[2..5][0..$][3]

Yes, but doing so utilizes expression templates and is relatively slow:
matrix_row_slice temp1 = matrix.opSlice(2,5);
matrix_col_slice temp2 = temp1.opSlice(0,$);
matrix                 = temp2.opIndex(3);

And causes code bloat. Worst matrix[2..5] by itself would be an unstable  
type. Either foo(matrix[2..5]) would not compile or it would generate code  
bloat and hard to find logic bugs. (Due to the fact that you've embedded  
the dimension of the slice operation into the type).

> means: first, take a slice 2..5 that returns a matrix range one  
> dimension smaller. Then, for that type take a slice from 0 to $. And so  
> on.
>
> This works great for row-wise storage. I'm not sure how efficient it  
> would be for other storage schemes.

No it doesn't. It works great for standard C arrays of arrays, but these  
are not matrices and have a large number of well documented performance  
issues when used as such. In general, multi-dimentional data structures  
relatively common and should be cleanly supported.

> Note how nice the distinction between the container and its views works:  
> there is only one matrix. But there are many ranges and subranges within  
> it, bearing various relationships with one another.

Yes, Data+View (i.e MVC for data structures) is a good thing(TM). But  
generally, matrices have been views into data and not the data themselves.  
(unless needed for memory management, etc)