Enhanced array appending

Thu Dec 24 09:20:31 PST 2009

grauzone Wrote:

> Steven Schveighoffer wrote:
> > Yes, but the problem I see with it is you are passing around that array_info pointer with the slice for *all* slices, not just ones you intend to use appending on.  There is plenty of code that uses arrays and slices without ever doing appending, so that extra pointer is wasted space, and also increases the cost of passing slices by value.
> 
> Suddenly having an additional field in the slice is the problem? Wasn't 
> semantics the problem (see your first reply to me above in this thread).

In my original reply, I focused on the fact that your original solution *doesn't work*.  Yes, it also suffered from the extra baggage, but that was a lesser problem.

> 
> Is having an additional field in the slice really a problem?
> 
> You could just provide a "light" slice type. This wouldn't have the 
> additional helper field, and would consist only of a pointer and length 
> field. You could make it implicitly convertible to normal slices.

Yeah, all this is possible.  Whether it's better is subject to opinion or testing.

> I see no problem with adding an additional field to usual slices. 
> There's also some benefit: clearer semantics (if you make guaranteed 
> performance part of the semantics) and a much simpler runtime.

Simpler runtime is not any benefit IMO.  Nobody cares how array appending works, they just know the semantics.  The semantics wouldn't be any different with your type, you still have all the same semantics, so I don't really get that point.

> > Your idea is essentially the idea behind my patch, except you can get the array info without having an additional pointer.  The cost of lookup for the block info is mitigated by the cache, making lookups cost very little.
> 
> But the cache is volatile, and a cache entry could be easily flushed by 
> doing many array append operations.  A single string processing function 
> could replace the entire cache with entries for string temporaries, and 
> cause nasty performance regressions in unrelated parts of the program 
> (e.g. in places where you really don't want each append operation to 
> reallocate the full array).

I'm not sure you understand the semantics.  When an array's block info is in the cache, no lookup of the block info from the GC is required, so the lock isn't taken, and the lookup is as simple as a linear search through 8 elements.  But when it is not in the cache, it is looked up via the GC, requiring a lock and a look through all the memory pools (I'm unsure of the performance, but it clearly is slower).  However, a cache miss does not mean an automatic reallocation, it just means the lookup of the block info is slower.  It can *still* be extended into the block.  Basically, the algorithm performance of appending with a cache miss is the same as a cache hit, just with a larger constant.

> The worst is, this behavior will be 
> relatively indeterministic. It's also hard to explain to "normal" 
> programmers.
> 
> Also, isn't it completely useless? If the programmer wants to append to 
> an array, he'll use an append struct just to be sure, and if he doesn't 
> want to append, inefficient array slice appending wouldn't be a problem.

Again, I think these points may be fostered by a misunderstanding of the patch.  The performance is much better than current performance, and people have had no problem dealing with the current runtime except in special circumstances.

-Steve