Array Operations: a[] + b[] etc.
John Colvin
john.loughran.colvin at gmail.com
Wed Nov 21 10:02:23 PST 2012
First things first: I'm not actually sure what the current spec
for this is,
http://dlang.org/arrays.html is not the clearest on the subject
and seems to rule out a lot of things that I reckon should work.
For this post I'm going to use the latest dmd from github.
Behaviour is sometimes quite different for different versions of
dmd, let alone gdc or ldc.
e.g.
int[] a = [1,2,3,4];
int[] b = [6,7,8,9];
int[] c;
int[] d = [10];
int[] e = [0,0,0,0];
a[] += b[]; // result [7, 9, 11, 13], as expected.
c = a[] + b[]; // Error: Array operation a[] + b[] not
implemented.
c[] = a[] + b[]; // result [], is run-time assert on some
compiler(s)/versions
d[] = a[] + b[] // result [7], also a rt assert for some
compiler(s)/versions
My vision of how things could work:
c = a[] opBinary b[];
should be legal. It should create a new array that is then
reference assigned to c.
d[] = a[] opBinary b[];
should be d[i] = a[i] + b[i] for all i in 0..length.
What should the length be? Do we silently truncate to the
shortest array or do we run-time assert (like ldc2 does, and so
did dmd for a while between 2.060 and now). Currently dmd (and
gdc) does neither of these reliably, e.g.
d[] = a[] + b[] results in [7],
a[] = d[] + b[] results in [16, 32747, 38805832, 67108873]
Another nice things to be able to do that i miss from working in
IDL, I'm not sure how they'd be possible in D though:
given a multidimensional array I should be able to slice and
index along any axis.
for example:
int[4][3] m = [[0,1,2,3],
[4,5,6,7],
[8,9,10,11]];
I can index vertically, i.e. m[1] == [4,5,6,7], but there's no
syntactic sugar for indexing horizontally. Obviously m[][2] just
gives me the 3rd row, so what could be a nice concise statement
suddenly requires a manually written loop that the compiler has
to work it's way through, extracting the meaning (see Walter on
this, here: http://www.drdobbs.com/loop-optimizations/229300270)
A possible approach, heavily tried and tested in numpy and IDL:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
http://www.atmos.umd.edu/~gcm/usefuldocs/IDL.html#operations
Use multiple indices within the brackets.
m[1,2] would be identical to m[1][2], returning 6
m[0..2,3] would return [3,7]
m[,2] would give me [2,6,10]
Alternative syntax could be m[*,2], m[:,2] or we could even
require m[0..$,2], I don't know how much of a technical challenge
each of these would be for parsing and lexing.
//An example, lets imagine a greyscale image, stored as an array
of pixel rows:
double[][] img = read_bmp(fn,"grey");
//we want to crop it to some user defined co-ords (x1,y1),(x2,y2):
//Version A, current syntax
auto img_cropped = img[y1..y2].dup;
foreach(ref row; img_cropped) {
row = row[x1..x2];
}
//3 lines of code for a very simple idea.
//Version B, new syntax
auto img_cropped = img[y1..y2, x1..x2];
//Very simple, easy to read code that is clear in it's purpose.
I propose that Version B would be equivalent to A: An independent
window on the data. Any reassignment of a row (i.e. pointing it
to somewhere else, not copying new data in) will have no effect
on the data. This scales naturally to higher dimensions and is in
agreement with the normal slicing rules: the slice itself is
independent of the original, but the data inside is shared.
I believe this would be a significant improvement to D,
particularly for image processing and scientific applications.
P.S.
As you can probably tell, I have no experience in compiler
design! I may be missing something that makes all of this
impossible/impractical. I also don't think this would have to
cause any code breakage at all, but again, I could be wrong.
P.P.S.
I think there many be something quite wrong with how the frontend
understands current array expression syntax... see here:
http://dpaste.dzfl.pl/f4a931db
More information about the Digitalmars-d
mailing list