Array types

Thu Aug 26 14:38:56 PDT 2010

Arrays are one of the most useful and most efficient data structures for nonfunctional languages. They look simple, but in a low-level language you sometimes need various kinds of them. So they are not so simple.

In D there are two kinds of built-in arrays:
A) 1D Dynamic arrays on the heap
B) Almost-nD rectangular fixed-sized arrays allocated on the stack by functions (or data segment), they may also go on the heap if they are part of an object (and with placement new they may go everywhere).

So far in D code I have had need of (and I have seen some people need) some other kinds of arrays, not (well) supported by D:
C) nD rectangular dynamic arrays;
D) Dynamic arrays allocated on the stack;
E) Fixed-sized arrays allocated on the heap;
F) Variable-sized structs allocated on the heap.

----------------------

The (C) arrays are not the same thing as dynamic arrays of dynamic arrays because:
- Some algorithms are not designed for a triangle where rows may differ in length. Testing that rows are all the same length at runtime wastes time, and if you don't test it then it may cause bugs;
- Sometimes you want to reshape a 2D matrix, so it's useful to store the matrix in contiguous memory;
- Sometimes you need complex slicing, for example removal of some columns and rows from a 2D matrix.

Recently in D.learn we have discussed this a bit.

In C# there are both arrays of arrays as in D, and rectangular dynamic arrays, that are manages with a syntax similar to Pascal arrays: arr[x,y,z]. I think there is no so need to put them into D. But I think it's important to have a official type of them in Phobos, for example implemented as a struct (only the item type and number of dimensions are known at compile time):

RectangularArray!(T, int ndimensions)

----------------------

The (D) arrays are sometimes useful in some high-performance routines. You can't implement them in Phobos. C99 has added them as VLA arrays. In bug 4357 I have proposed to use "scope", but that was not appreciated.

If you want (D) arrays you currently need to write something like (manually inlined code):

T[] arr = (cast(T*)alloca(n * T.sizeof))[0 .. n];
arr[] = T.init; // optional, but useful if T is a pointer or struct that contains pointers

This is not too much bad, and using a template mixin may help the situation.

D is designed to make simple the safe operations (like to create (A) arrays), and to make more elaborate the less safe operations. But in my opinion it's wrong to make something less safe than possible. Even if stack-allocated arrays are unsafe (if you escape their pointer/reference/fat pointer you will have bugs), there is no need to make _their creation too_ more unsafe/bug-prone than necessary.

A possibility is to use the C99 syntax, that is just allowing runtime variables when you define a "fixed-sized" array:

int n = 10;
int[n] arr;

That syntax is clean, readable, standard (it comes from C), easy to understand. A disadvantage is that visually it's not easy to tell apart a VLA from a normal fixed-sized array.

This is not allowed, because n is a runtime value:

int[n] foo(int n) {
  int[n] arr;
  return arr;
}

But this is allowed, and the arr needs to be copied on the heap:

int[] foo(int n) {
  int[n] arr;
  return arr;
}

I don't like this much, it adds some complexity and more cases.

----------------------

I have used (E) arrays while creating a deque, composed of fixed sized arrays on the heap. In this case their lengths are fixed and known at compile-time. In D I usually implement them wrapping them into a struct:

struct Foo(T, int SIZE) {
    T[SIZE] a;
}
void main() {
    auto f = new Foo!(int, 1024)();
}

Later I use the array as f.a[x]. This is not too much bad. Implementing (E) arrays in D gives problems syntax-wise, because in C you are free to use pointers too as arrays. So I think the current situation (using the wrapper struct) is good enough for the few situations were (E) arrays are useful.

----------------------

D supports zero-length fixed-sized arrays, useful to implement (F) "arrays" (but it seems LLVM does't support them, so in LDC they need 1 bit, I am not sure where such bit actually goes).

Variable-sized structs are not commonly used in high-level code, they are uncommon in general. So there is no need for D to support them as built-ins, they may be implemented manually by the programmer as need arise, also because they probably need to be quite customized for each situation.

On the other hand it's not too much bad to put into Phobos an intrusive Variable-sized struct type, something like:

struct VariableLengthStruct(T, Cargo) {
   Cargo cargo;
   size_t length;
   T[0] data;
   // operator methods
}

Where Cargo is the type of the intrusive payload, and VariableLengthStruct implements the opIndex method too that maps on the data attribute.

A helper function too may be added to help the allocation of a VariableLengthStruct.

In some situations VariableLengthStruct can't be used, because you need more custom structure, but I think that VariableLengthStruct is sometimes enough.

----------------------

The summary of this post is:
- It's positive to add efficient and _standard_ nD rectangular dynamic arrays to Phobos, as RectangularArray.
- Maybe a standard variable-sized struct is good in Phobos, like VariableLengthStruct.
- I'd still like syntax support for variable-sized stack-allocated arrays. But I don't like the extra complexity it seem to need. More thinking is needed about this.

Bye,
bearophile