Multidimensional Arrays
Oskar Linde
oskar.lindeREM at OVEgmail.com
Fri Mar 16 11:19:06 PDT 2007
Bill Baxter skrev:
> Oskar Linde wrote:
>> I have been implementing a strided multidimensional array type and am
>> interested in comments.
>>
>> It is basically designed after Norbert Nemec's suggestion:
>> http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
>>
>> +/- some things.
>>
>> The basic array type looks like this:
>>
>> struct Array(T,int dims = 1) {
>> T* ptr;
>> size_t[dims] range;
>> ptrdiff_t[dims] stride;
>>
>> const dimensions = dims;
>> alias T ElementType;
>>
>> ...
>> }
>
> Hey Oskar,
> I've been working on implementing something similar. I'd be interested
> to see some parts of your implementation.
> Mine's at
> http://www.dsource.org/projects/multiarray
Hi Bill,
I'm sorry for the late reply. I've had a quite hectic last few months
and not had time to dig into this. I have now at least extracted my old
multiarray code and turned it into something that compiles (and
converted it to use the new D variadic templates instead of the old
variadic.d hack):
http://www.csc.kth.se/~ol/multiarray.tar.gz
the multiarray module is sci.matrix.multiarray. Everything is in a very
crude shape.
> The main difference between our approaches seems to be that I've made
> 'dims' non-const (and therefore not a template parameter).
>
> I thought that would be better since sometimes it's handy to reshape a
> 1-d array into an equivalent 2d Nx1 or 3-d Nx1x1. And you have the
> potential to do broadcasting more easily that way too (E.g. multiply an
> N dim 1-d array times an Nx5 array just by duplicating the one axis
> implicitly). The down side is that range and stride then become dynamic
> arrays.
> I guess with your approach you can create a new view with the
> right # of dims when needed.
Yes. This is very powerful.
Given a 10x10x10 array arr, arr[0,all,all] will be a 10x10 array, not a
1x10x10 array or anything else. And that is all deduced at compile time.
> On the other hand that probably means you
> need more templated member functions, parameterized by Array dimension.
Possibly.
> Another tradeoff is that I can also have indexing and slicing return
> arrays of different dimension, but I can't have indexing return a single
> element (because the return type of opIndex is Array not float and you
> can't have two different return types). Instead I overloaded opCall for
> that. So M(4,3) is the value of element 4,3, but M[4,3] is a 1-element
> view.
In addition to being able to have full indexing return a single element
vs partial indexing returning slices I think the greatest advantage of
having the dimensionality as a compile time constant is the improved
compile time consistency checks.
I believe you will extremely rarely dynamically adjust the
dimensionality of a multidimensional array. You are almost always
required to know the dimensionality at compile time anyway. You can
always make slices of different dimensionality. Having a dimension with
a stride of 0, you could even duplicate the data over a dimension such
as [1,1,1,1,... (10000 times) ...,1] only occupying a single unit of
memory. Reshaping has to be done explicitly (which I see as a good thing).
In general, I think this falls back on the same conclusion as the
discussion regarding intrinsic vector operations and small static arrays
being value types a while ago. Low dimensionalities will most of the
time turn out to be static, while larger dimensions are dynamic. The
dimensionality of a multidimensional array will typically be static and
of a low order (For example, I've personally never encountered a case
needing more than 6 dimensions).
> I also made it a class in the end, though I started out with it being a
> struct. Partly that decision was motivated by the idea that generally I
> don't really want to pass 6 or more 32-bit numbers around by value, plus
> the presence of pointer member data, so using a class seemed reasonable.
> Having to remember to say 'new' everywhere has been a pain, though.
The biggest disadvantage I see is that data shuffling (such as in a loop
assigning slices of one array to slices of another one) will result in
the creation of lots of temporary heap allocated classes instead of,
when using a struct, a small constant stack memory usage.
One option could possibly be to make slicing return proxy structs that
(in the future implicitly) converted to class instances.
Note that the by-value passing doesn't necessarily mean any reduction in
performance. Being careful about making functions not changing the shape
of the array take the array as an inout (future const/readonly ref)
while those that need must make the same data-copy when dup-ing the
class anyway.
Another option to avoid passing lots of data by value could possibly be
to make a container struct containing a pointer to another struct
containing all the data.
> I have mine hooked up with some basic BLAS routines -- the routines for
> VV, MV, and MM multiplication, and the simplest least squares and linear
> system solvers.
That sounds really interesting. I will definitely give it a look.
/Oskar
More information about the Digitalmars-d-announce
mailing list