Enhanced array appending

Fri Dec 25 03:07:32 PST 2009

Leandro Lucarella Wrote:

> Steven Schveighoffer, el 24 de diciembre a las 12:59 me escribiste:
> > In fact, semantics are simpler.  You can completely eliminate the
> > explanation about stomping in the spec.
> 
> What about the cache? You have to explain to the users that there is
> a possibility that their appends works blazing fast, except... well,
> sometimes if you fill the cache they will suck. As gruzone said, find this
> kind of weird behaviours will be very hard.

I don't think you do.  We don't explain currently in the spec that if you append to 2 arrays in lockstep, performance drops off a cliff, I don't see why we should have to explain rare corner cases for the new code either.

BTW, after you start appending to several arrays in lockstep, the bottleneck seems to shift away from the append function and to something else -- no amount of cache helps.  I think its because the array pages are all interleaving and so cannot be extended.

> > No example?
> 
> No, I said that I think arrays/slices as they are have better performance
> (when you know exactly what you're doing and pre-allocate and do that kind
> of tricks). Unless you are asking for some other kind of examples?

I meant *any* kind of gain -- performance or otherwise.  If you have no examples where separated slices perform better or are not as awkward as combined slices/arrays, then I don't see why you'd want them.  What I'm looking for is:

This code sucks with combined arrays/slices:

....

Now, look how good it is with split arrays/slices:

....

> > You think splitting the features into 2 different types with
> > several overlapping features is easier to explain?
> 
> Yes. And I think they have overlapping features just like any other
> 2 containers, hashes and arrays have overlapping features, but they are
> completely different beasts. Same goes for arrays and slices.
> 
> PHP have arrays and hashes combined in one type. You think that's a good
> idea because arrays and hashes have overlapping features?

I hate to say this, and it has nothing to do with this argument, but I'm actually pretty impressed with php's arrays.  The rest of php, not so much.  They perform like hashes, but are ordered and sortable.  I tried to find out how they are implemented, but the code is so buried in the zend engine that I didn't know where to look.

> I really don't,
> and I do think arrays and slices are 2 completely different beasts that
> needs to be separated in the language. I think that's the natural good
> thing to do, not an optimization. I think D is broken by combining this
> 2 types into one.

They are only different because you say they are.  To me they are no different.  In fact, I don't even think we need to distinguish between arrays and slices, they should just all be arrays.  The result of slicing an array is -- another array.  I don't see the issue with that.

> You don't think that (and it looks like Andrei and
> Walter doesn't think that either) so there's no way we would agree on this
> topic.

Probably not.

> > I think that is why Andrei nixed the T[new] idea, because when he tried
> > to explain it in TDPL, it was very unnatural.
> 
> Well, I don't know what Andrei tried to do, but I think Python does pretty
> well with it, for example.

I don't have any experience with python, so I can't really say anything about it.

> > I think the whole basis of your argument is invalid -- the current
> > array/slice combination type is easier to use and explain *and* performs
> > better.
> 
> I can say exactly the same about you. That's why I said I don't argue
> anymore about this (grrr! I don't know why I ended up doing it in this
> thread). This is a matter of taste, you think array/slice combined are
> easier to use and I don't.

I'd love to see an example where you think this is the case.  If it's simply a matter of taste, then you have no argument.  If that's the case, it's just a personal choice, and at that point it's up to Walter.

> About performance, they only spot where
> splitting arrays and slices would perform worse than the current
> slice/arrays combined is in that you add an extra indirection for arrays
> because of the reference semantics (but you need to pass around a word
> less than with slices, so I don't know how the real impact of that would
> be).

If you were to make arrays a true reference type, I don't think you would have an allocated block of metadata pointing to a separate allocated block of data, I'd expect all to be in one block.  In that case, you don't have 2 levels of indirection.

I don't necessarily think that reference based arrays would perform much worse than the current scheme for normal usages, but the performance problem I see is when you want to append to a slice, you not only have the runtime reallocating when it may not need to (if the slice is at the end of an allocated block), but you have to spell out that you're changing the slice to an array by changing the type *before* appending.  Then it's on you to make sure the reallocation is necessary.  e.g. if you are appending one slice to another, and the slice being appended is 0 length at runtime, in order to avoid a costly reallocation, you have to check for this at runtime.

I just don't see why you think it's easier.

-Steve