The demise of T[new]

Tue Oct 20 10:34:13 PDT 2009

On Tue, 20 Oct 2009 12:30:57 -0400, Bill Baxter <wbaxter at gmail.com> wrote:

> On Tue, Oct 20, 2009 at 8:50 AM, Steven Schveighoffer
> <schveiguy at yahoo.com> wrote:
>> On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter at gmail.com>  
>> wrote:
>>
>>> On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
>>> <schveiguy at yahoo.com> wrote:
>>>>
>>>> On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
>>>> <newshound1 at digitalmars.com> wrote:
>>>>
>>>>> The purpose of T[new] was to solve the problems T[] had with passing  
>>>>> T[]
>>>>> to a function and then the function resizes the T[]. What happens  
>>>>> with
>>>>> the
>>>>> original?
>>>>>
>>>>> The solution we came up with was to create a third array type,  
>>>>> T[new],
>>>>> which was a reference type.
>>>>>
>>>>> Andrei had the idea that T[new] could be dispensed with by making a
>>>>> "builder" library type to handle creating arrays by doing things like
>>>>> appending, and then delivering a finished T[] type. This is similar  
>>>>> to
>>>>> what
>>>>> std.outbuffer and std.array.Appender do, they just need a bit of
>>>>> refining.
>>>>>
>>>>> The .length property of T[] would then become an rvalue only, not an
>>>>> lvalue, and ~= would no longer be allowed for T[].
>>>>>
>>>>> We both feel that this would simplify D, make it more flexible, and
>>>>> remove
>>>>> some awkward corner cases like the inability to say a.length++.
>>>>>
>>>>> What do you think?
>>>>
>>>> At the risk of sounding like bearophile -- I've proposed 2 solutions  
>>>> in
>>>> the
>>>> past for this that *don't* involve creating a T[new] type.
>>>>
>>>> 1. Store the allocated length in the GC structure, then only allow
>>>> appending
>>>> when the length of the array being appended matches the allocated  
>>>> length.
>>>>
>>>> 2. Store the allocated length at the beginning of the array, and use a
>>>> bit
>>>> in the array length to determine if it starts at the beginning of the
>>>> block.
>>>>
>>>> The first solution has space concerns, and the second has lots more
>>>> concerns, but can help in the case of having to do a GC lookup to
>>>> determine
>>>> if a slice can be appended (you'd still have to lock the GC to do an
>>>> actual
>>>> append or realloc).  I prefer the first solution over the second.
>>>>
>>>> I like the current behavior *except* for appending.  Most of the time  
>>>> it
>>>> does what you want, and the syntax is beautiful.
>>>>
>>>> In regards to disallowing x ~= y, I'd propose you at least make it
>>>> equivalent to x = x ~ y instead of removing it.
>>>
>>> If you're going to do ~= a lot then you should convert to the dynamic
>>> array type.
>>> If you're not going to do ~= a lot, then you can afford to write out x  
>>> = x
>>> ~ y.
>>>
>>> The bottom line is that it just doesn't make sense to append onto a
>>> "view" type.  It's really a kind of constness.  Having a view says the
>>> underlying memory locations you are looking at are fixed.  It doesn't
>>> make sense to imply there's an operation that can change those memory
>>> locations (other than shrinking the window to view fewer of them).
>>
>> Having the append operation extend into already allocated memory is an
>> optimization.  In this case, it's an optimization that can corrupt  
>> memory.
>>
>> If we can make append extend into already allocated memory *and* not  
>> cause
>> corruption, I don't see the downside.  And then there is one less array  
>> type
>> to deal with (, create functions that handle, etc.).
>>
>> Besides, I think Andrei's LRU solution is better than mine (and pretty  
>> much
>> in line with it).
>>
>> I still think having an Appender object or struct is a worthwhile  
>> thing, the
>> "pre-allocate array then set length to zero" model is a hack at best.
>
> But you still have the problem Andrei posted.  Code like this:
>
> void func(int[] x)
> {
>      x ~= 3;
>      x[0] = 42;
> }

depending on what you want, you then rewrite:

> void func(int[] x)
> {
>      x[0] = 42;
>      x ~= 3;
> }

or

> void func(int[] x)
> {
>      x = x ~ 3;
>      x[0] = 42;
> }

Generally when you are appending, you are not also changing the original  
data, so you don't care whether it's an optimization or not.

Your code is obviously broken anyways, since *nobody* ever sees the 3.

> it'll compile and maybe run just fine, but there's no way to know if
> the caller will see the 42 or not.   Unpredictable behavior like that
> is breeding grounds for subtle bugs.

I'm sure we could spend days coming up with code that introduces subtle  
bugs.  You can't fix all bugs that people may write.  I don't think your  
scenario is very likely.

More importantly, the problem with the current appending behavior is this:

void foo(int[] x)
{
   x ~= 3;
   ...
}

That may have just corrupted some data that you don't own, so defensively,  
you should write:

void foo(int[] x)
{
   x = x ~ 3;
   ...
}

But with Andrei's solution, you cannot possibly corrupt data with this  
line.  Now, if you then go and set one of the values in the original array  
(like you did), then you may or may not change the original array.  But as  
the function takes a mutable array, *you own the array* so it is a mistake  
to think when you pass in an array that's not const, you should expect it  
to remain unchanged.  If your goal is to affect the original array, then  
you should accept a ref argument or not append to it.

-Steve