D array expansion and non-deterministic re-allocation

Tue Dec 1 04:34:58 PST 2009

On Thu, 26 Nov 2009 17:45:30 -0500, Bartosz Milewski  
<bartosz-nospam at relisoft.com> wrote:

> Steve, I don't know about you, but this exchange clarified some things  
> for me. The major one is that it's dangerous to define the semantics of  
> a language construct in terms of implementation. You were defending some  
> points using implementation arguments rather than sticking to defined  
> semantics.

I was defending the semantics by using an example implementation.  I was  
not defining the semantics in terms of implementation.  The semantics are  
defined by the spec, and do not indicate when an array is reallocated and  
when it is not.  That detail is implementation defined.  My examples use  
dmd's implementation to show how the assumption can break.  You said the  
guy needs me to show him that it is broken, and all his tests pass, why  
can't I use my knowledge of the implementation to come up with an example?

I could rewrite my statements as:  "You should not rely on the array being  
reallocated via append, because D does not guarantee such reallocation.   
Using the reference implementation of dmd, it is possible to come up with  
an example of where this fails: ..."

> We have found out that one should never rely on the array being  
> re-allocated on expansion, even if it seems like there's no other way.  
> The only correct statement is that the freshly expanded part of the  
> array is guaranteed not to be write-shared with any other array.

I agree with this (except for "even if it seems like there's no other  
way,"  The spec says an allocation always occurs when you do a ~ b, so you  
can always rewrite a ~= b as a = a ~ b).  In fact, at one point to avoid  
stomping I went through Tango and found all places where append could  
result in stomping, and changed the code this way.  There were probably  
less than 5 instances.  Append is not a very common operation when you  
didn't create the array to begin with.

> However, this discussion veered away from a more important point. I  
> don't believe that programmers will consciously make assumptions about  
> re-allocation breaking sharing.

For the most part, this is ok -- rarely do you see someone append to an  
array they didn't create *and* modify the original data.

My belief is that people will expect more that appending an array  
*doesn't* reallocate.  If you have experience in programming, the language  
you are used to either treats arrays as value types or as reference  
types.  I don't think I've ever seen a language besides D that uses the  
hybrid type for arrays.  So you are going to come to D expecting value or  
reference.  If you expect value, you should quickly learn that's not the  
case because 99% of the time, arrays look like reference types.  It is  
natural then to expect appending to an array to affect all other aliases  
of that array, after all it is a reference type.  I just think your  
examples don't ring true in practice because there are simpler ways to  
guarantee allocation.  You have to go out of your way to write bad code  
that doesn't work correctly.

Finally, it's easy to turn an array into a reference type when passing as  
a parameter, just use the ref decorator.  All we need is a way to turn it  
into a value type, and I think Andrei's idea of Value!(arr) would be great  
for that.

> The danger is that it's easy to miss accidental sharing and it's very  
> hard to test for it.

I think this danger is rare, and it's easy to search for (just search for  
~= in your code, I did it with Tango).  I think it can be very well  
defined in a tutorial or book chapter.

-Steve