Possible change to array runtime?

Steven Schveighoffer schveiguy at yahoo.com
Thu Mar 13 08:24:01 PDT 2014


Buried inside another large thread, Don and Dicebot complained about D2's  
prevention of array stomping as being the biggest breaking change for D2,  
not that it breaks code, but silently destroys performance.

A little background for people who aren't aware:

1. D1 took the position that if an array slice started at the beginning of  
a block, and you appended to that slice, it would extend in-block no  
matter what else was referencing that data.

In other words:

char[] arr = "abc"; // this was ok in D1, which did not have the immutable  
keyword
char[] arr2 = arr[0..1];
arr2 ~= "de";

assert(arr == "ade"); // stomped!

This was a "feature", not a bug, because the idea is, once you have built  
a large or appropriately sized buffer, you may want to forget what it had,  
and re-use it by doing:

arr.length = 0;

2. D2 added a mechanism to prevent stomping, especially in the case of  
immutable, by storing the "used" length of a block inside the block itself.

This was done by yours truly, and at the same time, I also improved  
performance of array appending by exploiting the fact that D2 has  
thread-local storage by default to avoid a global GC lock.

However, any code that is compiled from D1, the following line still works  
fine:

arr.length = 0;

But has a vastly different meaning. Instead of re-using the block, it  
*abandons* the block, and allocates a new one. This leaves the original  
block as garbage, especially if nothing is left pointing at the original  
block.

3. Don's company uses D1 as its language, I highly recommend watching  
Don's Dconf13 presentation (and look forward to his Dconf14 one!) to see  
how effective D code can create unbelievable speed, especially where array  
slices are concerned. But to the above line, in D2, they must add the  
following code to get the same behavior:

arr.assumeSafeAppend();

This is an "unsafe" operation which says, "I don't care that there is  
still data in there, that other slices may refer to, just throw it away."

And in fact, I believe the performance would improve over D1 if they did  
that. But at the moment, just "switching to D2" will cause unacceptable  
performance loss.

However, there may be a way to keep the original behavior in the right  
circumstances. It's a tad bit hacky, but makes a lot of sense. So here  
goes:

Proposal: iff you are setting the length of an array via the length  
property AND you are setting the length to 0 AND the array starts at the  
beginning of the block AND the array is mutable (not const or immutable),  
THEN assumeSafeAppend is called on that array AUTOMATICALLY.

rationale for this not affecting existing code:

If you mean to abandon an array, left to other references, are you more  
likely to use arr = null, or arr.length = 0? The former disavows all  
claims to the array, allows it to be garbage collected if no other  
references exist, and is shorter to type. arr.length = 0 retains the  
pointer to the array, which keeps it in memory, but it has no access to  
the existing data. The only valid operation on that array at that point is  
to append, which means you are going to reallocate anyway, there was no  
reason to keep a pointer at the original array.

I think in vast majority of cases, where one intends the behavior to  
reallocate, they would use arr = null. In the other cases, I would argue,  
they don't know what they are doing, they incorrectly think it's going to  
take over the array again. In that case, they likely don't care if the  
existing data gets stomped.

It does not disallow the original operation, you can do the same thing  
with arr = arr[0..0];

The reason we disallow it on const or immutable is simply to prevent  
implicit overwriting of immutable data. In cases where you have a buffer  
you are building, and especially in cases of D1 code porting to D2, const  
and immutable would not exist.

I think this would "fix" the problem Sociomantic has, and not break any  
existing code that wasn't already broken.

The single largest drawback that I can think of is that the behavior would  
be different in a very small number of cases. Are those cases valid? Is  
this enough to reject the proposal?

Destroy.

-Steve


More information about the Digitalmars-d mailing list