The demise of T[new]
dsimcha
dsimcha at yahoo.com
Sun Oct 18 19:24:59 PDT 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
> >>> 3. A T[new] should be implicitly convertible to a slice. For example:
> >>>
> >>> auto foo = someFunctionThatReturnsTnew();
> >>> // foo is a T[new].
> >>> T[] bar = someFunctionThatReturnsTnew();
> >>> // Works. bar is a T[]. The T[new] went into oblivion.
> >>>
> >>> This solves the problem of slices not being closed over .dup and ~.
> >> Check.
> >
> > So then why is slices not being closed over .dup, ~, etc. still a problem? With
> > implicit conversion, they for all practical purposes are.
> The problems are with auto and template argument deduction.
Ok, now I can see how this would be a legitimate problem, but only in a few corner
cases:
void doStuff(T)(T someArrayLikeObject) {
someArrayLikeObject = someArrayLikeObject[0..$ - 1];
}
int[] foo = someFunction();
int[] bar = someFunction();
auto baz = foo ~ bar;
// baz is an int[new]. This doesn't usually matter, since you can use it
// just like an int[] and if you care about the small performance difference,
// you should be more careful.
doStuff(baz);
// Are the effects of doStuff() visible in this scope?
// Depends if baz is an int[] or an int[new].
However, this is really, really, *really* a corner case. Any reasonable template
would either slice the array to make sure it owns the slice information or pass by
reference to make sure the change is propagated, so that there is no ambiguity.
> Here's what I wrote to Walter:
> ====================
> I'm going to suggest something terrible - let's get rid of T[new]. I
> know it's difficult to throw away work you've already done, but really
> things with T[new] start to look like a Pyrrhic victory. Here are some
> issues:
> * The abstraction doesn't seem to come off as crisp and clean as we both
> wanted;
> * There are efficiency issues, such as the two allocations that you
> valiantly tried to eliminate in a subset of cases;
Once you've opened the can of worms of having to perform allocations, one versus
two allocations isn't very important.
> * Explaining two very similar but subtly different types to newcomers is
> excruciatingly difficult (I'll send you a draft of the chapter - it
> looks like a burn victim who didn't make it);
This is admittedly a legitimate concern. However, you can get off the ground in
D, or any language for that matter, without being a full-fledged language lawyer.
I frankly don't think it's important for beginners to understand every subtlety
and corner case. Heck, I've been using D for a while and have done some
non-trivial projects in it and there are definitely dark corners that I don't
understand. (Complex numbers come to mind.) I just don't care because I figure
I'll learn them if/when I need to know them.
> * Furthermore, explaining people when to use one vs. the other is much
> more difficult than it seems. On the surface, it goes like this: "If you
> need to append stuff, use T[new]. If not, use T[]." Reality is much more
> subtle. For one thing, T[new] does not allow contraction from the left,
> whereas T[] does. That puts T[] at an advantage. So if you want to
> append stuff and also contract from the left, there's nothing our
> abstractions can help you with.
Why can't a T[new] contract from the left? As far as I can tell, you could do
something like:
struct TNew(T) {
typeof(this) opAssign(T[] slice) {
if(slice.ptr >= this.ptr && slice.ptr < this.ptr + capacity) {
// Then we own this block of memory and can assign
// a slice by reference.
// Adjust effective capacity.
capacity -= cast(size_t) (slice.ptr - this.ptr);
length = slice.length;
this.ptr = slice.ptr;
} else {
// Assign the slice by copying.
}
}
// Other stuff.
}
You would then contract from the left the same way as for a slice:
int[new] foo = new int[5];
foo = foo[1..$];
It would simply require a few integer/pointer comparisons and still be reasonably
efficient.
> Instead of all T[new] stuff, I suggest the following:
> 1. We stay with T[] and we define a struct ArrayBuilder that replaces
> T[new] with a much more clear name and charter. Phobos already has
> Appender which works very well. We can beef that up to allow array-like
> primitives.
> 2. Assigning to a slice's .length allocates a new slice if growth is needed.
> 3. Disallow ~= for slices. ArrayBuilder will define it.
> 4. That's it.
> Java got away with a similar approach using StringBuilder:
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
> Scala has something very similar called ArrayBuffer:
> http://www.nabble.com/ArrayList-and-ArrayBuffer-td15448842.html
> And guess what, C# stole Java's StringBuilder as well:
> http://msdn.microsoft.com/en-us/library/2839d5h5%28VS.71%29.aspx
> So it looks like many programmers coming from other languages will
> already be familiar with the idea that you use a "builder" to grow an
> array, and then you use a non-growable array. One thing that Appender
> has and is really cool is that it can grow an already-existing slice. So
> you can grow a slice, play with it for a while, and then grow it again
> at low cost. I don't think the other languages allow that.
> I understand how you must feel about having implemented T[new] and all,
> but really please please try to detach for a minute and think back. Does
> what we've got now with T[new] make D a much better place? Between the
> increase of the language, the difficulty to explain the minute
> subtleties, and the annoying corner cases and oddities, I think it might
> be good to reconsider.
Yes, Java, etc. get away with this because Java has no pretense of being a make
simple things simple kind of language, but one big thing that D has over Java,
etc. is that it can be used almost like a scripting language for simple stuff. It
is about the only language that scales well all the way from simple scripts to
uber-complicated metaprogramming. If we get rid of nice builtin arrays that
support everything with clean syntax, we're throwing out making simple things
simple in exchange for fixing a few corner cases. IMHO this is a terrible
tradeoff. If you and Walter *really* despise T[new], I would prefer to see
builtin arrays kept exactly the way they are, bugs and all, and for array
builders/appenders/whatever to be improved but still be considered purely a
performance hack. The bugs in the current arrays are pretty nasty from a
theoretical safety/purity point of view (esp. the one that's a hole in
immutability), but are seldom run into in practice.
More information about the Digitalmars-d
mailing list