RFC on range design for D2
Robert Jacques
sandford at jhu.edu
Tue Sep 9 07:28:13 PDT 2008
On Tue, 09 Sep 2008 07:06:55 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> Robert Jacques wrote:
>> On Mon, 08 Sep 2008 23:53:17 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>
>>> Robert Jacques wrote:
>>>> On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu
>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>> Denis Koroskin wrote:
>>>>>> 3) Walter mentioned that built-in array could be re-implemented
>>>>>> using a pair of pointers instead of ptr+length. Will it ever get a
>>>>>> green light? It fits range concept much better.
>>>>>
>>>>> Walter told me to first implement my design, and if it works, he'll
>>>>> do the change. Yes, it does fit ranges much better because the
>>>>> often-used next and, um, pop will only touch one word instead of two.
>>>> I'd warn that changing away from ptr+length would create logical
>>>> incosistencies between 1D arrays and 2D/3D/ND arrays.
>>>
>>> How so?
>> An ND array is typically defined as a fat pointer like so:
>> struct array(T,size_t N) {
>> T* ptr;
>> size_t[N] lengths; // of each dimension
>> ptrdiff_t[N] byte_strides; // of each dimension
>> }
>> So a 1D array is
>> {
>> T* ptr;
>> size_t lengths;
>> ptrdiff_t byte_strides = T.sizeof; //Currently a compile time
>> constant in the built-in array
>> size_t length() { return lengths; }
>> }
>> which is logically consistent with a general dense matrix and aside
>> from some name change and the stride being a compile time constant, is
>> identical to the current D arrays.
>> However, { T* first; T* last } may not be logically extended to ND
>> arrays, particularly sliced ND arrays, as T* last no longer has any
>> meaning.
>
> Hmmm, I see. That could become a problem if we wanted lower-dimensional
> matrices to be prefixes of higher-dimensional matrices. This is a worthy
> goal, but one that my matrices don't pursue.
>
>>>>>> 4) We need some way of supporting dollar notation in user
>>>>>> containers. The hack of using __dollar is bad (although it works).
>>>>>
>>>>> It doesn't work for multiple dimensions. There should be an
>>>>> opDollar(uint dim) that gives the library information on which
>>>>> argument count it occured in. Consider:
>>>>>
>>>>> auto x = matrix[$-1, $-1];
>>>>>
>>>>> Here the dollar's occurrences have different meanings. A good start
>>>>> would be to expand the above into:
>>>>>
>>>>> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
>>>> I'd also add that multiple dimension slicing should be supported.
>>>> i.e.
>>>> auto x = matrix[2..5,0..$,3]
>>>> would become
>>>> auto x =
>>>> matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3)
>>>> with
>>>> struct Slice (T) { T start; T end; }
>>>> Strided slices would also be nice. i.e. matrix[0..$:10] // decimate
>>>> the array
>>>
>>> Multidimensional slicing can be implemented with staggered indexing:
>>>
>>> matrix[2..5][0..$][3]
>> Yes, but doing so utilizes expression templates and is relatively slow:
>> matrix_row_slice temp1 = matrix.opSlice(2,5);
>> matrix_col_slice temp2 = temp1.opSlice(0,$);
>> matrix = temp2.opIndex(3);
>> And causes code bloat. Worst matrix[2..5] by itself would be an
>> unstable type. Either foo(matrix[2..5]) would not compile or it would
>> generate code bloat and hard to find logic bugs. (Due to the fact that
>> you've embedded the dimension of the slice operation into the type).
>
> What is an unstable type?
What I meant, is that the type is fundamentally not designed to exist by
itself. And therefore if not paired with the other slices, you've put your
program into an a danger state.
> There is no use of expression templates,
So what would you call embedding an operation into a very temporary type
that's not expected to last beyond the line of it's return?
> but indeed multiple slices are created. This isn't as bad as it seems
> because the desire was to access several elements, so the slice is
> supposed to be around for long enough to justify its construction cost.
True, but it's still less efficient.
> I agree it would be onerous to access a single element with e.g.
> matrix[1][1][2].
And what about the code bloat and compile time or runtime logic bugs that
this design is prone to produce?
>>> means: first, take a slice 2..5 that returns a matrix range one
>>> dimension smaller. Then, for that type take a slice from 0 to $. And
>>> so on.
>>>
>>> This works great for row-wise storage. I'm not sure how efficient it
>>> would be for other storage schemes.
>> No it doesn't. It works great for standard C arrays of arrays, but
>> these are not matrices and have a large number of well documented
>> performance issues when used as such. In general, multi-dimentional
>> data structures relatively common and should be cleanly supported.
>
> [Citation needed]
Imperfect C++: Practical Solutions for Real-Life Programming by Matthew
Wilson has an entire chapter dedicated to the performance of matrices in
C++, comparing Boost, the STL, plain arrays and a few structures of his
own. Also, arrays of arrays don't support O(1) slicing, resize and
creation, use more memory, etc.
As for multi-dimentional data structures, they are used heavily by the
fields of games, graphics, scientific computing, databases and I've
probably forgotten some others.
More information about the Digitalmars-d-announce
mailing list