D array expansion and non-deterministic re-allocation

Thu Nov 19 03:56:40 PST 2009

On Wed, 18 Nov 2009 13:21:49 -0500, Bartosz Milewski  
<bartosz-nospam at relisoft.com> wrote:

> Andrei Alexandrescu Wrote:
>
>> Leandro Lucarella wrote:
>> > Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:
>> >>> 3.  If you **really** care about performance, you should only  
>> append when you
>> >>> don't know the length in advance.  If you know the length, you  
>> should always
>> >>> pre-allocate.
>> >> We will have collections and all those good things, but I don't see
>> >> how the proposal follows from the feedback. My perception is that
>> >> this is a group of people who use D. Bartosz' concern didn't cause
>> >> quite a riot, so as far as I can tell there is no big issue at
>> >> stake.
>> >
>> > I didn't say anything (until now) because this was discussed already.
>> > Dynamic arrays/slices appending is horribly broken (I know it's well
>> > defined, and deterministic, but you are just condemned to make  
>> mistakes,
>> > it just doesn't work as one would expect, even when you know how it  
>> works
>> > you have to keep fighting your intuition all the time).
>>
>> In which ways do you think arrays horribly broken? Same as Bartosz  
>> mentions?
>>
>> One question is whether you actually have had bugs and problems coding,
>> or if arrays stopped you from getting work done.
>>
>
> What Leandro points out is that using D arrays is not straightforward  
> and may present a steep learning curve. In particular, even the beginner  
> would be required to understand section 4.1.9 of TDPL (Expanding). For  
> many people arrays' behavior might be counter-intuitive and a source of  
> mistakes.
>
> In fact, even after reading 4.1.9 I don't know what to expect in some  
> cases. Here's an example:
>
> int[] a = [0];
> auto b = a;
> a ~= 1;
> b ~= 2;
> What is a[1]?
>
> Is this considered "stomping" and requiring a re-allocation?

a[1] would be 1 if an MRU cache was introduced.  This is exactly the sort  
of problem that the MRU cache is supposed to solve.

What should happen is:

a ~= 1 should allocate in place because its parameters were in the MRU  
cache.
b ~= 1 should reallocate because the a~=1 should have removed its  
parameters from the MRU cache.

I agree that arrays behave differently than what people naively expect.   
However, their usage is so much better than anything from other languages  
I've used.  One of the biggest "problems" in understanding arrays I've  
found from having to explain them to other people is that they are half  
value type and half reference type.  I'd say most of the confusion comes  
 from that, but it's that feature that makes them so great IMO.  I'm unsure  
whether introducing a different usage would leave them as great, but I  
hope we can find a way to make them intuitive *and* keep them as useful as  
they are now.

The implementation-defined determinisim is probably another issue that  
bites people, but I think the cases are so much rarer.  Generally when you  
are appending *and* manipulating the original data, you have a unique  
array.  Otherwise your usage falls into either read-only + append usage,  
or buffer usage where you don't usually append.  Data-stomping is a memory  
issue (and a const/invariant issue), so I think that needs the most  
immediate attention.  If it can be solved without changing the syntax or  
system, then I think it's a win-win.  We can always decide to use a new  
concept later if we want to fix the deterministic behavior.

In fact, how hard would it be to introduce a deterministic construct  
*without* changing current behavior?  I mean, if the end result of your  
proposal is you add something like std::vector, why not still allow  
appending to slices *and* add something like std::vector where people can  
use that if they want better behavior?

-Steve