Enhanced array appending

Thu Dec 24 08:26:07 PST 2009

Steven Schveighoffer wrote:
> grauzone Wrote:
> 
>> Steven Schveighoffer wrote:
>>> The problem is that adding a capacity field only works if the object is a reference type.  A value type will make either choose to make copies of the capacity field, resulting in the same stomping problem (and maybe worse), or will zero out one of the copy's capacity field, resulting in an artificial limitation for appending.  The only true way to make this work right is to have a capacity field shared among all the aliases to the same data, and the most logical place for that is in the allocated block.
>> I see... Go makes slices value types (which references the array data), 
>> but I'm not sure how or if they handle this situation.
>>
>> Then, why use a cache to find the pointer to the capacity field inside 
>> the array memory block? You could just store a pointer to the capacity 
>> field in the slice.
>>
>> Something like this, I guess:
>>
>> struct slice {
>>      void* ptr;
>>      size_t length;
>>      array_info* array;
>> }
>>
>> [snip]
>>
>> Would this work?
> 
> Yes, but the problem I see with it is you are passing around that array_info pointer with the slice for *all* slices, not just ones you intend to use appending on.  There is plenty of code that uses arrays and slices without ever doing appending, so that extra pointer is wasted space, and also increases the cost of passing slices by value.

Suddenly having an additional field in the slice is the problem? Wasn't 
semantics the problem (see your first reply to me above in this thread).

Is having an additional field in the slice really a problem?

You could just provide a "light" slice type. This wouldn't have the 
additional helper field, and would consist only of a pointer and length 
field. You could make it implicitly convertible to normal slices.

These light slices could be used by someone who needs the additional 
performance so badly. Actually, they could simply be implemented as 
library type.

I see no problem with adding an additional field to usual slices. 
There's also some benefit: clearer semantics (if you make guaranteed 
performance part of the semantics) and a much simpler runtime.

> Your idea is essentially the idea behind my patch, except you can get the array info without having an additional pointer.  The cost of lookup for the block info is mitigated by the cache, making lookups cost very little.

But the cache is volatile, and a cache entry could be easily flushed by 
doing many array append operations. A single string processing function 
could replace the entire cache with entries for string temporaries, and 
cause nasty performance regressions in unrelated parts of the program 
(e.g. in places where you really don't want each append operation to 
reallocate the full array). The worst is, this behavior will be 
relatively indeterministic. It's also hard to explain to "normal" 
programmers.

Also, isn't it completely useless? If the programmer wants to append to 
an array, he'll use an append struct just to be sure, and if he doesn't 
want to append, inefficient array slice appending wouldn't be a problem.

You could avoid these issues just by adding a new field to the slice struct.

Needless to say, T[new] would be the correct solution here. It also has 
cleaner semantics. Yes, I heard it sucked and it was considered a 
failure by Walter/Andrei, but we still don't know why.

> The patch is a compromise between performance for appending and performance for everything else.  I think there will still be a need in specialized code for a fatter array type that allows appending without complicated GC lookups, but the patch provides safe appending for a reasonable performance hit without affecting other aspects of array usage.  As with all compromises, there are always different ways of doing things, so time will tell if this patch is the best method.
> 
> -Steve