Array Operations: a[] + b[] etc.

Wed Nov 21 10:02:23 PST 2012

First things first: I'm not actually sure what the current spec 
for this is,
http://dlang.org/arrays.html is not the clearest on the subject 
and seems to rule out a lot of things that I reckon should work.

For this post I'm going to use the latest dmd from github. 
Behaviour is sometimes quite different for different versions of 
dmd, let alone gdc or ldc.

e.g.

int[] a = [1,2,3,4];
int[] b = [6,7,8,9];
int[] c;
int[] d = [10];
int[] e = [0,0,0,0];

a[] += b[];       // result [7, 9, 11, 13], as expected.

c = a[] + b[];    // Error: Array operation a[] + b[] not 
implemented.

c[] = a[] + b[];  // result [], is run-time assert on some 
compiler(s)/versions
d[] = a[] + b[]   // result [7], also a rt assert for some 
compiler(s)/versions

My vision of how things could work:
c = a[] opBinary b[];
should be legal. It should create a new array that is then 
reference assigned to c.

d[] = a[] opBinary b[];
should be d[i] = a[i] + b[i] for all i in 0..length.
What should the length be? Do we silently truncate to the 
shortest array or do we run-time assert (like ldc2 does, and so 
did dmd for a while between 2.060 and now). Currently dmd (and 
gdc) does neither of these reliably, e.g.
d[] = a[] + b[] results in [7],
a[] = d[] + b[] results in [16, 32747, 38805832, 67108873]

Another nice things to be able to do that i miss from working in 
IDL, I'm not sure how they'd be possible in D though:
given a multidimensional array I should be able to slice and 
index along any axis.
for example:
int[4][3] m = [[0,1,2,3],
                [4,5,6,7],
                [8,9,10,11]];
I can index vertically, i.e. m[1] == [4,5,6,7], but there's no 
syntactic sugar for indexing horizontally. Obviously m[][2] just 
gives me the 3rd row, so what could be a nice concise statement 
suddenly requires a manually written loop that the compiler has 
to work it's way through, extracting the meaning (see Walter on 
this, here: http://www.drdobbs.com/loop-optimizations/229300270)

A possible approach, heavily tried and tested in numpy and IDL: 
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
http://www.atmos.umd.edu/~gcm/usefuldocs/IDL.html#operations

Use multiple indices within the brackets.
     m[1,2] would be identical to m[1][2], returning 6
     m[0..2,3] would return [3,7]
     m[,2] would give me [2,6,10]
     Alternative syntax could be m[*,2], m[:,2] or we could even 
require m[0..$,2], I don't know how much of a technical challenge 
each of these would be for parsing and lexing.

//An example, lets imagine a greyscale image, stored as an array 
of pixel rows:

double[][] img = read_bmp(fn,"grey");

//we want to crop it to some user defined co-ords (x1,y1),(x2,y2):

//Version A, current syntax

auto img_cropped = img[y1..y2].dup;
foreach(ref row; img_cropped) {
     row = row[x1..x2];
}
//3 lines of code for a very simple idea.

//Version B, new syntax

auto img_cropped = img[y1..y2, x1..x2];

//Very simple, easy to read code that is clear in it's purpose.

I propose that Version B would be equivalent to A: An independent 
window on the data. Any reassignment of a row (i.e. pointing it 
to somewhere else, not copying new data in) will have no effect 
on the data. This scales naturally to higher dimensions and is in 
agreement with the normal slicing rules: the slice itself is 
independent of the original, but the data inside is shared.

I believe this would be a significant improvement to D, 
particularly for image processing and scientific applications.

P.S.
As you can probably tell, I have no experience in compiler 
design! I may be missing something that makes all of this 
impossible/impractical. I also don't think this would have to 
cause any code breakage at all, but again, I could be wrong.

P.P.S.
I think there many be something quite wrong with how the frontend 
understands current array expression syntax... see here: 
http://dpaste.dzfl.pl/f4a931db