The demise of T[new]

Sun Oct 18 15:13:36 PDT 2009

== Quote from Walter Bright (newshound1 at digitalmars.com)'s article
> The purpose of T[new] was to solve the problems T[] had with passing T[]
> to a function and then the function resizes the T[]. What happens with
> the original?
> The solution we came up with was to create a third array type, T[new],
> which was a reference type.
> Andrei had the idea that T[new] could be dispensed with by making a
> "builder" library type to handle creating arrays by doing things like
> appending, and then delivering a finished T[] type. This is similar to
> what std.outbuffer and std.array.Appender do, they just need a bit of
> refining.
> The .length property of T[] would then become an rvalue only, not an
> lvalue, and ~= would no longer be allowed for T[].
> We both feel that this would simplify D, make it more flexible, and
> remove some awkward corner cases like the inability to say a.length++.
> What do you think?

This is ridiculous.  The status quo works well most of the time and just has a few
really ugly corner cases.  As long as you're only appending to one array at a
time, appending isn't even that slow anymore now that bug 2900
(http://d.puremagic.com/issues/show_bug.cgi?id=2900) is fixed.

I frankly think it's absurd to make working with arrays an order of magnitude
harder and less elegant in the 90+% of cases where they work fine just to fix a
few corner case bugs.  Don't get me wrong, the corner case bugs should be fixed
because they're pretty nasty safety issues.  They just shouldn't be fixed in a way
that makes arrays substantially harder to use, or even syntactically uglier, in
the cases where they already work well.  As good a programmer as Andrei is, I'm
sure whatever he comes up with will be much less syntactially pleasing and easy to
use than something the core language understands.

Here's my proposal for how T[new] should work:

1.  It should be a reference type to be consistent with slices.  Yes, slices are
kind of a hybrid, but they're more semantically similar to reference types than
value types.  If you can't modify the length of a slice anymore, then for all
practical purposes it will be a reference type.

2.  A T[new] should support all the same operations as a T[] with semantics as
similar as common sense will allow, including indexing, ~, .dup, .idup, slice
assign, etc.  Basically, it should have, to the greatest degree possible without
defeating the purpose, the same compile time interface.

3.  A T[new] should be implicitly convertible to a slice.  For example:

auto foo = someFunctionThatReturnsTnew();
// foo is a T[new].
T[] bar = someFunctionThatReturnsTnew();
// Works.  bar is a T[].  The T[new] went into oblivion.

This solves the problem of slices not being closed over .dup and ~.

4.  It should be guaranteed that no block of memory is ever referenced by more
than one T[new] instance.  This is needed to guarantee safety when appending to
immutable arrays, etc.

5.  Assigning a T[new] to another T[new] should be by reference, just like
assigning a class instance to another class instance.  Assigning a T[] to a T[new]
should duplicate the memory block referenced by the T[] because this is probably
the only way to guarantee (4).

6.  Since T[new] guarantees unique access to a memory block, it should have an
assumeUnique() method that returns an immutable slice and sets the T[new]'s
reference to the memory block to null.  This solves the problem of building
immutable arrays without the performance penalty of not being able to pre-allocate
or the unsafeness of having to cowboy cast it to immutable.

7.  As long as the GC is conservative, there absolutely *must* be a method of
manually freeing the memory block referenced by a T[new] provided that the GC
supports this operation, though it doesn't have to be particularly pretty.  In
general, since D is a systems language, T[new] should not be too opaque.  A good
way to do this might be to make all of the fields of the T[new] public but
undocumented.  If you *really* want to mess with it, you'll read the source code
and figure it out.

8.  The first call to opSlice on a T[new] should set a flag that indicates that
there may be multiple pointers to the underlying memory block.  Before that flag
is set, appends to a T[new] should result in calls to GC.free() to free the old
block whenever it needs to be expanded (since we can guarantee that we own it
exclusively).  This will help deal with false pointer issues, since D's GC looks
like it will remain conservative for the foreseeable future.