The demise of T[new]

Sun Oct 18 18:16:45 PDT 2009

dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
>>> 3.  A T[new] should be implicitly convertible to a slice.  For example:
>>>
>>> auto foo = someFunctionThatReturnsTnew();
>>> // foo is a T[new].
>>> T[] bar = someFunctionThatReturnsTnew();
>>> // Works.  bar is a T[].  The T[new] went into oblivion.
>>>
>>> This solves the problem of slices not being closed over .dup and ~.
>> Check.
> 
> So then why is slices not being closed over .dup, ~, etc. still a problem?  With
> implicit conversion, they for all practical purposes are.

The problems are with auto and template argument deduction.

>>> 4.  It should be guaranteed that no block of memory is ever referenced by more
>>> than one T[new] instance.  This is needed to guarantee safety when appending to
>>> immutable arrays, etc.
>> That doesn't go with reference semantics. Uncheck.
>>> 5.  Assigning a T[new] to another T[new] should be by reference, just like
>>> assigning a class instance to another class instance.
>> Check. (BTW contradicts 4)
> 
> There was a slight misunderstanding here.  When I say an instance, I mean one
> T[new] heap object == one instance, no matter how many pointers/references to this
> instance you have.  The assumption is that T[new] objects live entirely on the
> heap and have semantics similar to classes.  For example:
> 
> uint[new] foo = new uint[100];
> uint[new] bar = foo;  // Really just a pointer assignment.
> 
> Now, both foo and bar point to the same uint[new] instance.  If anything is
> modified via foo, it is seen when viewed from bar and vice-versa.
> 
> To clarify, no *main block of memory that holds the data in the array* is ever
> referenced by more than one T[new] *heap object*.

Oh yah. Check that sucker.

>>> Assigning a T[] to a T[new]
>>> should duplicate the memory block referenced by the T[] because this is probably
>>> the only way to guarantee (4).
>> No check, but could have been done.
> 
> Actually, this can even be done by COW.  Initially, assigning a T[] to a T[new]
> could just be a pointer assignment and setting a flag, as long as a copy of the
> data is made when you try to increase the length of the T[new] or append to it.
> Basically, as long as you use the T[new] as if it were a T[], no copying needs to
> be done.
> 
>>> 6.  Since T[new] guarantees unique access to a memory block, it should have an
>>> assumeUnique() method that returns an immutable slice and sets the T[new]'s
>>> reference to the memory block to null.  This solves the problem of building
>>> immutable arrays without the performance penalty of not being able to pre-allocate
>>> or the unsafeness of having to cowboy cast it to immutable.
>> Uncheck.
> 
> Now that I've explained my uniqueness thing better, does this sound feasible?

Nah, there are still problems with Ts that themselves are elaborate 
types containing references, aliasing etc. assumeUnique makes a rather 
strong claim.

>>> 7.  As long as the GC is conservative, there absolutely *must* be a method of
>>> manually freeing the memory block referenced by a T[new] provided that the GC
>>> supports this operation, though it doesn't have to be particularly pretty.  In
>>> general, since D is a systems language, T[new] should not be too opaque.  A good
>>> way to do this might be to make all of the fields of the T[new] public but
>>> undocumented.  If you *really* want to mess with it, you'll read the source code
>>> and figure it out.
>> Check while delete still exists. Please use malloc for that stuff.
> 
> As long as I can get at the pointer to the memory block held by T[new] I can pass
> it to GC.free.  All that would really be needed is a .ptr property that does
> something like what .ptr does for T[]s.

Alrighty. Check that.

>> So now: every place I've said "check" means there was implementation and
>> book-quality illustrated documentation that Walter and I have done with
>> the sweat of our brow. At the end we looked at the result and concluded
>> we should throw away all that work.
>> Andrei
> 
> This begs the question:  Why?  Walter's post on the subject was rather brief and I
> can't understand for the life of me why you guys would throw away such an elegant
> solution.

Here's what I wrote to Walter:

====================
I'm going to suggest something terrible - let's get rid of T[new]. I 
know it's difficult to throw away work you've already done, but really 
things with T[new] start to look like a Pyrrhic victory. Here are some 
issues:

* The abstraction doesn't seem to come off as crisp and clean as we both 
wanted;

* There are efficiency issues, such as the two allocations that you 
valiantly tried to eliminate in a subset of cases;

* Explaining two very similar but subtly different types to newcomers is 
excruciatingly difficult (I'll send you a draft of the chapter - it 
looks like a burn victim who didn't make it);

* Furthermore, explaining people when to use one vs. the other is much 
more difficult than it seems. On the surface, it goes like this: "If you 
need to append stuff, use T[new]. If not, use T[]." Reality is much more 
subtle. For one thing, T[new] does not allow contraction from the left, 
whereas T[] does. That puts T[] at an advantage. So if you want to 
append stuff and also contract from the left, there's nothing our 
abstractions can help you with.

Instead of all T[new] stuff, I suggest the following:

1. We stay with T[] and we define a struct ArrayBuilder that replaces 
T[new] with a much more clear name and charter. Phobos already has 
Appender which works very well. We can beef that up to allow array-like 
primitives.

2. Assigning to a slice's .length allocates a new slice if growth is needed.

3. Disallow ~= for slices. ArrayBuilder will define it.

4. That's it.

Java got away with a similar approach using StringBuilder:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html

Scala has something very similar called ArrayBuffer:

http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html

And guess what, C# stole Java's StringBuilder as well:

http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx

So it looks like many programmers coming from other languages will 
already be familiar with the idea that you use a "builder" to grow an 
array, and then you use a non-growable array. One thing that Appender 
has and is really cool is that it can grow an already-existing slice. So 
you can grow a slice, play with it for a while, and then grow it again 
at low cost. I don't think the other languages allow that.

I understand how you must feel about having implemented T[new] and all, 
but really please please try to detach for a minute and think back. Does 
what we've got now with T[new] make D a much better place? Between the 
increase of the language, the difficulty to explain the minute 
subtleties, and the annoying corner cases and oddities, I think it might 
be good to reconsider.
================

> Given that we already agree that a T[new] can be implicitly cast to a
> T[], the lack of closure under ~, .dup, etc. seems like a non-issue.  When I was a
> beginner, before I got into templates, a major attraction to D was that it was a
> fast, compiled language where basic things like arrays (mostly) just worked.  To
> take away all the syntactic sugar for appending and lengthening from arrays and
> push this into a separate library type that doesn't feel like a first class object
> or (I assume) even support most array semantics would be a massive, massive kludge
> no matter how well-implemented that library type was.

I think the language will be in considerably better shape without T[new] 
than with it. I understand this is a subjective point. You can argue 
against on virtually every one of my objections.

Andrei