D array expansion and non-deterministic re-allocation

Wed Nov 25 13:44:47 PST 2009

Bartosz Milewski wrote:
> Steven Schveighoffer Wrote:
> 
>> Bartosz Milewski Wrote:
>>
>>> Steven Schveighoffer Wrote:
>>>
>>>
>>>> Bottom line: if a function isn't supposed to change the buffer, the  
>>>> signature should be const for that parameter.  It's one of the principles  
>>>> of const, and why it's in D2 in the first place.  I'd explain to the coder  
>>>> that he is wrong to expect that modifying a buffer in a function that  
>>>> isn't supposed to modify a buffer is acceptable (and hopefully he gets it,  
>>>> or else I don't have time to deal with people who insist on being right  
>>>> when they are not).
>>>>
>>>> BTW, in my experience, the newbie expectaction of ~= is usually that it  
>>>> modifies the original even when it doesn't, not the other way around.
>>>>
>>> The guy insists that the reallocation happens in the quote function (otherwise there would be stomping), so no sharing happens and the original is not modified. He tested it on a variety of inputs! I'm not totally making it up--I had this kind of arguments with C++ programmers. You tell them that it's safer to use const and they laugh at you. "My code works, why should I change it!"
>> C++ const is kind of a joke :)
>>
>> But in any case, it's trivial to come up with a case that causes buffer issues.  If he insists on a test case, then I'd give him one.
>>
>> Programmers like this don't last long at companies.  I had a coworker once who insisted that a bug he was working on was going to be impossible to fix, so there was no point in working on it.  My boss said that if he (the boss) could fix it, the coworker would be let go.  My boss fixed it in one day.
>>
>> And there are *tons* of logic errors that don't cause bugs for most inputs, I don't see why we are focusing on this one case.
>>
> 
> The problem here is that the guy is sort of right. Indeed quote() expands the slice before overwriting it and that requires re-allocation to avoid stomping over the original buffer. There is however a boundary case, when the whole buffer matches the pattern. Then the expansion is done in place and the buffer is modified. This would never happen if expansion were guaranteed to re-allocate. 
> 
> What this example was supposed to illustrate is that it's easy to forget to explicitly duplicate an array, especially if it's not clear who's responsible--the caller or the callee. 
> 
> This is somewhat reminiscent of the situation in C++ and the responsibility to delete an object. Interestingly, in C++ programs you can use unique_ptr's to have deterministic deletion. In D you could also use uniqueness to have deterministic reallocation (or rather avoid it when it's not observable). 

I agree that the underlying array sharing can be often a hassle. This is 
a problem that goes beyond just expansion - it's all the implied 
aliasing that goes on whenever you pass arrays around.

How about creating a struct Value!T that transforms T (be it an array or 
a class) into a value type? Then if you use Value!(int[]), you're 
effectively dealing with values throughout (even though internally they 
might be COWed). Sometimes I also see a need for DeepValue!T which would 
e.g. duplicate transitively arrays of arrays and full object trees. For 
the latter we need some more introspection though. But we have 
everything in the laguage to make Value and DeepValue work with arrays.

What do you think?

Andrei