Resizable Arrays?

Fri Feb 27 19:51:44 PST 2009

== Quote from dsimcha (dsimcha at yahoo.com)'s article
> == Quote from Jason House (jason.james.house at gmail.com)'s article
> > Are there any good reasons to allow built in arrays to be resizable?
> > There's already plenty of empirical evidence that building arrays by appending
> is incredibly slow.
> > There are also bugs in bugzilla where resizing arrays after assigning one array
> to another violates type safety. Those bugs can be solved by always reallocatimg
> arrays when resizing, but that makes performance even worse...
> > While not much of an argument, C# also prevents array resizing.
> > Is it time to rely on standard libraries for data structures and let built in
> arrays be fixed length? It's a breaking change...
> I think that builtin arrays in D are a godsend.  Resizable arrays are such an
> important, heavily used feature that they should get special treatment from the
> language, to ensure that they are syntactically elegant, efficient both at runtime
> and in terms of compilation time, easy to use, free of odd corner cases, and in
> general are true first class citizens.  Nonetheless, D's arrays do have some rough
> edges.  My proposal would be as follows:
> There should be two array types:
> T[new] and T[].
> This has been proposed before, though not in as much detail as what I'm
> suggesting.  The semantics should work as follows:
> T[new] is considered a subtype of T[].  This works mostly like you would expect.
> T[new] can be implicitly converted to T[].  The twist is that T[] can also be
> assigned to T[new] without any explicit conversion, but a copy of the underlying
> data is implicitly made for safety.
> Assigning either a T[] or a T[new] to a T[] will result in aliasing:
> T[] foo;
> T[new] bar = new T[N];
> bar[0] = 1;
> foo = bar;
> foo[0] = 2;
> writeln(bar[0]);  // Prints 2.
> T[new]s are always assigned by value to make concatenation and resizing safe.  The
> underlying data is copied implicitly when assigning from a T[] to a T[new] or from
> a T[new] to another T[new].
> T[new] foo;
> T[] bar = new T[N];  // new returns T[new], using implicit conversion.
> bar[0] = 1;
> foo = bar;
> foo[0] = 2;
> writeln(bar[0]);  // Prints 1.
> T[]s can be sliced, but cannot be enlarged or appended to.  They should be laid
> out the same way they are now, as a struct of a pointer and a length.
> T[new]s can be appended to and enlarged in addition to being sliced.  Their layout
> should be a struct of pointer, length, capacity to make appending fast.  This will
> also make implicit conversion to T[] essentially free, since all that is necessary
> is slicing off the capacity field.
> I also wonder if this proposal could be extended to fix some of the covariance
> issues with arrays (see Bugzilla 2095).  This might work by only allowing
> covariant types to be assigned to T[new], not T[].  For example:
> class A{}
> class B:A{}
> B[new] ba = [new B];
> A[] aa = ba;  // Error:  Cannot assign covariant types to T[].
> A[new] aa = ba;  // Works, but copy is implicitly made.

One minor detail that I thought of:  When returning a T[new] from a function, the
copying could be optimized away.  If any escaping happened during the function
call, this would be either by value or as a T[] instead of a T[new].  Therefore,
at the end of a function, the only T[new] reference to the heap space in question
is guaranteed to be the T[new] that is about to be returned.  Since the reference
in the callee is about to be destroyed, it can safely be returned without any copying.