The demise of T[new]
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sun Oct 18 18:16:45 PDT 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
>>> 3. A T[new] should be implicitly convertible to a slice. For example:
>>>
>>> auto foo = someFunctionThatReturnsTnew();
>>> // foo is a T[new].
>>> T[] bar = someFunctionThatReturnsTnew();
>>> // Works. bar is a T[]. The T[new] went into oblivion.
>>>
>>> This solves the problem of slices not being closed over .dup and ~.
>> Check.
>
> So then why is slices not being closed over .dup, ~, etc. still a problem? With
> implicit conversion, they for all practical purposes are.
The problems are with auto and template argument deduction.
>>> 4. It should be guaranteed that no block of memory is ever referenced by more
>>> than one T[new] instance. This is needed to guarantee safety when appending to
>>> immutable arrays, etc.
>> That doesn't go with reference semantics. Uncheck.
>>> 5. Assigning a T[new] to another T[new] should be by reference, just like
>>> assigning a class instance to another class instance.
>> Check. (BTW contradicts 4)
>
> There was a slight misunderstanding here. When I say an instance, I mean one
> T[new] heap object == one instance, no matter how many pointers/references to this
> instance you have. The assumption is that T[new] objects live entirely on the
> heap and have semantics similar to classes. For example:
>
> uint[new] foo = new uint[100];
> uint[new] bar = foo; // Really just a pointer assignment.
>
> Now, both foo and bar point to the same uint[new] instance. If anything is
> modified via foo, it is seen when viewed from bar and vice-versa.
>
> To clarify, no *main block of memory that holds the data in the array* is ever
> referenced by more than one T[new] *heap object*.
Oh yah. Check that sucker.
>>> Assigning a T[] to a T[new]
>>> should duplicate the memory block referenced by the T[] because this is probably
>>> the only way to guarantee (4).
>> No check, but could have been done.
>
> Actually, this can even be done by COW. Initially, assigning a T[] to a T[new]
> could just be a pointer assignment and setting a flag, as long as a copy of the
> data is made when you try to increase the length of the T[new] or append to it.
> Basically, as long as you use the T[new] as if it were a T[], no copying needs to
> be done.
>
>>> 6. Since T[new] guarantees unique access to a memory block, it should have an
>>> assumeUnique() method that returns an immutable slice and sets the T[new]'s
>>> reference to the memory block to null. This solves the problem of building
>>> immutable arrays without the performance penalty of not being able to pre-allocate
>>> or the unsafeness of having to cowboy cast it to immutable.
>> Uncheck.
>
> Now that I've explained my uniqueness thing better, does this sound feasible?
Nah, there are still problems with Ts that themselves are elaborate
types containing references, aliasing etc. assumeUnique makes a rather
strong claim.
>>> 7. As long as the GC is conservative, there absolutely *must* be a method of
>>> manually freeing the memory block referenced by a T[new] provided that the GC
>>> supports this operation, though it doesn't have to be particularly pretty. In
>>> general, since D is a systems language, T[new] should not be too opaque. A good
>>> way to do this might be to make all of the fields of the T[new] public but
>>> undocumented. If you *really* want to mess with it, you'll read the source code
>>> and figure it out.
>> Check while delete still exists. Please use malloc for that stuff.
>
> As long as I can get at the pointer to the memory block held by T[new] I can pass
> it to GC.free. All that would really be needed is a .ptr property that does
> something like what .ptr does for T[]s.
Alrighty. Check that.
>> So now: every place I've said "check" means there was implementation and
>> book-quality illustrated documentation that Walter and I have done with
>> the sweat of our brow. At the end we looked at the result and concluded
>> we should throw away all that work.
>> Andrei
>
> This begs the question: Why? Walter's post on the subject was rather brief and I
> can't understand for the life of me why you guys would throw away such an elegant
> solution.
Here's what I wrote to Walter:
====================
I'm going to suggest something terrible - let's get rid of T[new]. I
know it's difficult to throw away work you've already done, but really
things with T[new] start to look like a Pyrrhic victory. Here are some
issues:
* The abstraction doesn't seem to come off as crisp and clean as we both
wanted;
* There are efficiency issues, such as the two allocations that you
valiantly tried to eliminate in a subset of cases;
* Explaining two very similar but subtly different types to newcomers is
excruciatingly difficult (I'll send you a draft of the chapter - it
looks like a burn victim who didn't make it);
* Furthermore, explaining people when to use one vs. the other is much
more difficult than it seems. On the surface, it goes like this: "If you
need to append stuff, use T[new]. If not, use T[]." Reality is much more
subtle. For one thing, T[new] does not allow contraction from the left,
whereas T[] does. That puts T[] at an advantage. So if you want to
append stuff and also contract from the left, there's nothing our
abstractions can help you with.
Instead of all T[new] stuff, I suggest the following:
1. We stay with T[] and we define a struct ArrayBuilder that replaces
T[new] with a much more clear name and charter. Phobos already has
Appender which works very well. We can beef that up to allow array-like
primitives.
2. Assigning to a slice's .length allocates a new slice if growth is needed.
3. Disallow ~= for slices. ArrayBuilder will define it.
4. That's it.
Java got away with a similar approach using StringBuilder:
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
Scala has something very similar called ArrayBuffer:
http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html
And guess what, C# stole Java's StringBuilder as well:
http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
So it looks like many programmers coming from other languages will
already be familiar with the idea that you use a "builder" to grow an
array, and then you use a non-growable array. One thing that Appender
has and is really cool is that it can grow an already-existing slice. So
you can grow a slice, play with it for a while, and then grow it again
at low cost. I don't think the other languages allow that.
I understand how you must feel about having implemented T[new] and all,
but really please please try to detach for a minute and think back. Does
what we've got now with T[new] make D a much better place? Between the
increase of the language, the difficulty to explain the minute
subtleties, and the annoying corner cases and oddities, I think it might
be good to reconsider.
================
> Given that we already agree that a T[new] can be implicitly cast to a
> T[], the lack of closure under ~, .dup, etc. seems like a non-issue. When I was a
> beginner, before I got into templates, a major attraction to D was that it was a
> fast, compiled language where basic things like arrays (mostly) just worked. To
> take away all the syntactic sugar for appending and lengthening from arrays and
> push this into a separate library type that doesn't feel like a first class object
> or (I assume) even support most array semantics would be a massive, massive kludge
> no matter how well-implemented that library type was.
I think the language will be in considerably better shape without T[new]
than with it. I understand this is a subjective point. You can argue
against on virtually every one of my objections.
Andrei
More information about the Digitalmars-d
mailing list