Flat multi-dim arrays.

Thu Aug 10 13:22:37 PDT 2006

Mikola Lysenko wrote:
> I agree that a multi-dimensional array structure would be extremely
> useful, however there are a few questions that must be answered before the
> syntax gets changed.
> 

I don't think this has to be overly complicated, even to "get it (mostly) right". If you look at the 
[,] syntax as just another way of expressing [][] (with a different allocation strategy) then what 
is built-in to the compiler internals already might work just fine. It's really easy to allocated MD 
arrays out of a contiguous pool and that's where the speed and ease of implementation benefits come 
from (not only in D but probably Fortran as well).

> 1.  How should foreach work?
> 
> There are two main possibilities, either foreach iterates over each
> element, or else it iterates over arrays of one less rank. Unfortunately,
> these behaviors are mutually exclusive so it is important to choose
> carefully.  I personally prefer the view that a multi-array is like an
> associative array of tuples of ints. Under this perspective, the first
> option is more sensible, but the second approach is more consistent with
> existing dynamic array semantics.

The original post demo'd the first because if they want the 2nd they can use the jagged array syntax.

> 
> 2.  How should array concatenation or resizing be applied?
> 
> Concatenating to a multi-dimensional array seems difficult, since it is
> inevitable that a user will attempt to grow an array along any of its
> dimensions.  This must trigger a large and costly memory recopy if it is
> not along the last rank.  A similar issue could happen if a user attempts
> to change one of the dimensions in the middle ranks.  The memory layout
> would become dramatically altered, and the entire array would need to be
> copied/shifted.
> 

What does Fortran do now? If it's not offered built-in in other high-performance, statically 
compiled languages then don't do it. If it is, then this actually sounds like a great place for 
library support because it will usually be an expensive single operation and can probably not be 
done more efficiently "built-in".

> 3.  Should it be possible to dynamically change the rank of a multi-array?
> 
> This can present many hazards, since there is no safe way to retain the
> contents of an array across such a transformation.  One solution is to
> embed the rank into the type of the array, making it immutable.

I would say no, not built-in to start with.

> 
> 4.  How should the new types memory be laid out?
> 
> In the current specification for D, a dynamic array's memory looks like
> this:
> 
> struct DynArray(T)
> {
> 	size_t len;
> 	T * arr;
> }
> 

The original post suggested a way to lay them out "flat" in memory and then use the internal 
representation to access the elements the same way as jagged arrays. So, no new internal structure 
is used (large jagged arrays are slow because of their memory layout, not because of how the 
references to the elements are calculated, etc.).

> In a multi-array, there are 2 possible scenarios:
> 
> struct MultiArray(T)
> {
> 	size_t rank;
> 	size_t * dimensions;
> 	T * arr;
> }
> 

I've tried this... It is slower w/ both DMD and GDC - apparently 'fragments' the memory used to 
lookup and then access the elements. Whatever the case the current compiler implementations don't do 
this efficiently.

> In this situation, we can dynamically adjust the rank and dimension of the
> multi-array, at the expense of allocating 2 blocks of dynamic memory.  If
> we make rank static, we get the following:
> 
> struct MultiArray(T, RANK)
> {
> 	size_t dims[RANK];
> 	T * arr;
> }
> 
> This is slightly simpler, but rank is fixed.  It also has the advantage
> that a rank-1 multi-array is byte for byte equal to a dynamic array.
> 

Tried this too. It is potentially pretty expensive - passing "large" structs by val. Using a class 
internally means another thing to allocate, slower member access and more complexity internal to the 
compiler.

> Finally, we need to decide on row major or column major layout.  This is a
> tricky issue because there is no "best" answer.
> 
> 
> 5.  How do slices work?
> 
> You can't just slice a random chunk from the middle of an array in place
> like a regular array, since the layout will be incorrect.  By necessity,
> the memory will have to be copied, which is transforms the copy-on-write
> idiom into a dangerous new copy-on-read situation.
> 
> 
> Before the language gets changed, there needs to be more discussion of the
> fundamental issues surrounding multi-arrays.
> 
> 

Slices on the last dimension could be done like so for example:

int arr[,] = new int[10,100], ar2 = new int[10,10];

ar2[2,0..$] = arr[9,0..10]; // internally converted to ar2[2][0..$] = arr[9][0..10];
int[] ia = arr[5,0..10].dup; // internally converted to int[] ia = arr[5][0..10].dup;

Obviously that doesn't cover all of the possibilities.

Thanks,

- Dave