Arrays: Core or Library Type?

Thu May 29 08:17:30 PDT 2008

On 2008-05-29 03:13:39 +0200, Walter Bright <newshound1 at digitalmars.com> said:

> http://reddit.com/info/6l6i4/comments/
> 
> http://dobbscodetalk.com/index.php?option=com_myblog&show=Arrays-Core-or-Library-Type-.html&Itemid=29

a 
> 
little OT but I am implementing multidimensional dense rectangular 
arrays in D and they are coming out quite well.
Obviously they are Library arrays and in (due to effort/usefulness 
ratio) I have done only arrays with dynamically chosen size (but 
compile time rank).
There is an overhead in this choice, (and for example this arrays are 
not suited for 3 (or 4) elements vectors, but an arrays 3D vectors... 
works nicely.

For me the performance was important, and so I defined looping 
constructs, that could also be parallelized with multiple threads in 
the future.

Writing this in D 1.0 I really noted that D 2.0 offers definite 
improvements that make me want to move to it: being able to define a 
member function outside a class (future), structure destructurs, and a 
very nice example of why invariant is useful.
I tested the performance with a sort of dot product on 2D slices of 3D 
arrays (a very though test as you do just a multiply add per element), 
and using the looping templates I get a performance close to the one of 
the optimal loop with pointers, and to the fortran version.
Writing the loop externally with explicit indexing:
    for (int i=0;i<ndim;i++)
    for (int j=0;j<ndim;j++)
    for (int k=0;k<ndim;k++)
        res[i]+=d1[i,j,k]*d2[0,j,k];

(that a fortran compiler optimizes close to the optimal one) is 8-9 
times slower.
This because even if the indexing operation
    double opIndex(int i,int j, int k){
        int i1,i2;
        i1=startIdx+i*strides[0];
        i2=i1+j*strides[1];
        return m_data[i2+k*strides[2]];
    }

is inlined the assignements i1 and i2 are not floated out of the inner loop.
If strides and startIdx are declared invariant then the compiler knows 
that floating them to the outer loops is safe and might do it, bringing 
the time down to 2-3 times the optimal code.
Replacing multiplications with additions in the indexes and using 
pointers instead of array indexing bring it close to the optimal code.

(
Actually in D 1.0 using just single variables and not an array (I 
wasn't able to use a const array) and declaring them const should  
allow the compiler to do these optimizations, but at least gdc does not 
do it, the code gets ugly, and the templatized version of it is 
actually slower than the non const version... but invariant is really 
what you want.
)

So with these things I think that an array in D 2.0 can come close the 
core one and this a testament on how good D is.
This does not contradict the fact that I think that a core array (at 
least for 1D) is a *very* nice thing to have.

Fawzi