Possible change to array runtime?
Steven Schveighoffer
schveiguy at yahoo.com
Thu Mar 13 08:24:01 PDT 2014
Buried inside another large thread, Don and Dicebot complained about D2's
prevention of array stomping as being the biggest breaking change for D2,
not that it breaks code, but silently destroys performance.
A little background for people who aren't aware:
1. D1 took the position that if an array slice started at the beginning of
a block, and you appended to that slice, it would extend in-block no
matter what else was referencing that data.
In other words:
char[] arr = "abc"; // this was ok in D1, which did not have the immutable
keyword
char[] arr2 = arr[0..1];
arr2 ~= "de";
assert(arr == "ade"); // stomped!
This was a "feature", not a bug, because the idea is, once you have built
a large or appropriately sized buffer, you may want to forget what it had,
and re-use it by doing:
arr.length = 0;
2. D2 added a mechanism to prevent stomping, especially in the case of
immutable, by storing the "used" length of a block inside the block itself.
This was done by yours truly, and at the same time, I also improved
performance of array appending by exploiting the fact that D2 has
thread-local storage by default to avoid a global GC lock.
However, any code that is compiled from D1, the following line still works
fine:
arr.length = 0;
But has a vastly different meaning. Instead of re-using the block, it
*abandons* the block, and allocates a new one. This leaves the original
block as garbage, especially if nothing is left pointing at the original
block.
3. Don's company uses D1 as its language, I highly recommend watching
Don's Dconf13 presentation (and look forward to his Dconf14 one!) to see
how effective D code can create unbelievable speed, especially where array
slices are concerned. But to the above line, in D2, they must add the
following code to get the same behavior:
arr.assumeSafeAppend();
This is an "unsafe" operation which says, "I don't care that there is
still data in there, that other slices may refer to, just throw it away."
And in fact, I believe the performance would improve over D1 if they did
that. But at the moment, just "switching to D2" will cause unacceptable
performance loss.
However, there may be a way to keep the original behavior in the right
circumstances. It's a tad bit hacky, but makes a lot of sense. So here
goes:
Proposal: iff you are setting the length of an array via the length
property AND you are setting the length to 0 AND the array starts at the
beginning of the block AND the array is mutable (not const or immutable),
THEN assumeSafeAppend is called on that array AUTOMATICALLY.
rationale for this not affecting existing code:
If you mean to abandon an array, left to other references, are you more
likely to use arr = null, or arr.length = 0? The former disavows all
claims to the array, allows it to be garbage collected if no other
references exist, and is shorter to type. arr.length = 0 retains the
pointer to the array, which keeps it in memory, but it has no access to
the existing data. The only valid operation on that array at that point is
to append, which means you are going to reallocate anyway, there was no
reason to keep a pointer at the original array.
I think in vast majority of cases, where one intends the behavior to
reallocate, they would use arr = null. In the other cases, I would argue,
they don't know what they are doing, they incorrectly think it's going to
take over the array again. In that case, they likely don't care if the
existing data gets stomped.
It does not disallow the original operation, you can do the same thing
with arr = arr[0..0];
The reason we disallow it on const or immutable is simply to prevent
implicit overwriting of immutable data. In cases where you have a buffer
you are building, and especially in cases of D1 code porting to D2, const
and immutable would not exist.
I think this would "fix" the problem Sociomantic has, and not break any
existing code that wasn't already broken.
The single largest drawback that I can think of is that the behavior would
be different in a very small number of cases. Are those cases valid? Is
this enough to reject the proposal?
Destroy.
-Steve
More information about the Digitalmars-d
mailing list