RFC on range design for D2

Tue Sep 9 07:28:13 PDT 2008

On Tue, 09 Sep 2008 07:06:55 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Robert Jacques wrote:
>> On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu  
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>
>>> Robert Jacques wrote:
>>>> On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu  
>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>> Denis Koroskin wrote:
>>>>>> 3) Walter mentioned that built-in array could be re-implemented  
>>>>>> using a pair of pointers instead of ptr+length. Will it ever get a  
>>>>>> green light? It fits range concept much better.
>>>>>
>>>>> Walter told me to first implement my design, and if it works, he'll  
>>>>> do the change. Yes, it does fit ranges much better because the  
>>>>> often-used next and, um, pop will only touch one word instead of two.
>>>>  I'd warn that changing away from ptr+length would create logical  
>>>> incosistencies between 1D arrays and 2D/3D/ND arrays.
>>>
>>> How so?
>>  An ND array is typically defined as a fat pointer like so:
>> struct array(T,size_t N) {
>>     T* ptr;
>>     size_t[N]    lengths;      // of each dimension
>>     ptrdiff_t[N] byte_strides; // of each dimension
>> }
>> So a 1D array is
>> {
>>     T* ptr;
>>     size_t lengths;
>>     ptrdiff_t byte_strides = T.sizeof; //Currently a compile time  
>> constant in the built-in array
>>     size_t length() { return lengths; }
>> }
>> which is logically consistent with a general dense matrix and aside  
>> from some name change and the stride being a compile time constant, is  
>> identical to the current D arrays.
>> However, { T* first; T* last } may not be logically extended to ND  
>> arrays, particularly sliced ND arrays, as T* last no longer has any  
>> meaning.
>
> Hmmm, I see. That could become a problem if we wanted lower-dimensional  
> matrices to be prefixes of higher-dimensional matrices. This is a worthy  
> goal, but one that my matrices don't pursue.
>
>>>>>> 4) We need some way of supporting dollar notation in user  
>>>>>> containers. The hack of using __dollar is bad (although it works).
>>>>>
>>>>> It doesn't work for multiple dimensions. There should be an  
>>>>> opDollar(uint dim) that gives the library information on which  
>>>>> argument count it occured in. Consider:
>>>>>
>>>>> auto x = matrix[$-1, $-1];
>>>>>
>>>>> Here the dollar's occurrences have different meanings. A good start  
>>>>> would be to expand the above into:
>>>>>
>>>>> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
>>>>  I'd also add that multiple dimension slicing should be supported.  
>>>> i.e.
>>>> auto x = matrix[2..5,0..$,3]
>>>> would become
>>>> auto x =  
>>>> matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)  
>>>> with
>>>> struct Slice (T) { T start; T end; }
>>>> Strided slices would also be nice. i.e. matrix[0..$:10] // decimate  
>>>> the array
>>>
>>> Multidimensional slicing can be implemented with staggered indexing:
>>>
>>> matrix[2..5][0..$][3]
>>  Yes, but doing so utilizes expression templates and is relatively slow:
>> matrix_row_slice temp1 = matrix.opSlice(2,5);
>> matrix_col_slice temp2 = temp1.opSlice(0,$);
>> matrix                 = temp2.opIndex(3);
>>  And causes code bloat. Worst matrix[2..5] by itself would be an  
>> unstable type. Either foo(matrix[2..5]) would not compile or it would  
>> generate code bloat and hard to find logic bugs. (Due to the fact that  
>> you've embedded the dimension of the slice operation into the type).
>
> What is an unstable type?

What I meant, is that the type is fundamentally not designed to exist by  
itself. And therefore if not paired with the other slices, you've put your  
program into an a danger state.

> There is no use of expression templates,

So what would you call embedding an operation into a very temporary type  
that's not expected to last beyond the line of it's return?

> but indeed multiple slices are created. This isn't as bad as it seems  
> because the desire was to access several elements, so the slice is  
> supposed to be around for long enough to justify its construction cost.

True, but it's still less efficient.

> I agree it would be onerous to access a single element with e.g.  
> matrix[1][1][2].

And what about the code bloat and compile time or runtime logic bugs that  
this design is prone to produce?

>>> means: first, take a slice 2..5 that returns a matrix range one  
>>> dimension smaller. Then, for that type take a slice from 0 to $. And  
>>> so on.
>>>
>>> This works great for row-wise storage. I'm not sure how efficient it  
>>> would be for other storage schemes.
>>  No it doesn't. It works great for standard C arrays of arrays, but  
>> these are not matrices and have a large number of well documented  
>> performance issues when used as such. In general, multi-dimentional  
>> data structures relatively common and should be cleanly supported.
>
> [Citation needed]

Imperfect C++: Practical Solutions for Real-Life Programming by Matthew  
Wilson has an entire chapter dedicated to the performance of matrices in  
C++, comparing Boost, the STL, plain arrays and a few structures of his  
own. Also, arrays of arrays don't support O(1) slicing, resize and  
creation, use more memory, etc.

As for multi-dimentional data structures, they are used heavily by the  
fields of games, graphics, scientific computing, databases and I've  
probably forgotten some others.