The demise of T[new]

Tue Oct 20 09:30:57 PDT 2009

On Tue, Oct 20, 2009 at 8:50 AM, Steven Schveighoffer
<schveiguy at yahoo.com> wrote:
> On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter at gmail.com> wrote:
>
>> On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
>> <schveiguy at yahoo.com> wrote:
>>>
>>> On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
>>> <newshound1 at digitalmars.com> wrote:
>>>
>>>> The purpose of T[new] was to solve the problems T[] had with passing T[]
>>>> to a function and then the function resizes the T[]. What happens with
>>>> the
>>>> original?
>>>>
>>>> The solution we came up with was to create a third array type, T[new],
>>>> which was a reference type.
>>>>
>>>> Andrei had the idea that T[new] could be dispensed with by making a
>>>> "builder" library type to handle creating arrays by doing things like
>>>> appending, and then delivering a finished T[] type. This is similar to
>>>> what
>>>> std.outbuffer and std.array.Appender do, they just need a bit of
>>>> refining.
>>>>
>>>> The .length property of T[] would then become an rvalue only, not an
>>>> lvalue, and ~= would no longer be allowed for T[].
>>>>
>>>> We both feel that this would simplify D, make it more flexible, and
>>>> remove
>>>> some awkward corner cases like the inability to say a.length++.
>>>>
>>>> What do you think?
>>>
>>> At the risk of sounding like bearophile -- I've proposed 2 solutions in
>>> the
>>> past for this that *don't* involve creating a T[new] type.
>>>
>>> 1. Store the allocated length in the GC structure, then only allow
>>> appending
>>> when the length of the array being appended matches the allocated length.
>>>
>>> 2. Store the allocated length at the beginning of the array, and use a
>>> bit
>>> in the array length to determine if it starts at the beginning of the
>>> block.
>>>
>>> The first solution has space concerns, and the second has lots more
>>> concerns, but can help in the case of having to do a GC lookup to
>>> determine
>>> if a slice can be appended (you'd still have to lock the GC to do an
>>> actual
>>> append or realloc).  I prefer the first solution over the second.
>>>
>>> I like the current behavior *except* for appending.  Most of the time it
>>> does what you want, and the syntax is beautiful.
>>>
>>> In regards to disallowing x ~= y, I'd propose you at least make it
>>> equivalent to x = x ~ y instead of removing it.
>>
>> If you're going to do ~= a lot then you should convert to the dynamic
>> array type.
>> If you're not going to do ~= a lot, then you can afford to write out x = x
>> ~ y.
>>
>> The bottom line is that it just doesn't make sense to append onto a
>> "view" type.  It's really a kind of constness.  Having a view says the
>> underlying memory locations you are looking at are fixed.  It doesn't
>> make sense to imply there's an operation that can change those memory
>> locations (other than shrinking the window to view fewer of them).
>
> Having the append operation extend into already allocated memory is an
> optimization.  In this case, it's an optimization that can corrupt memory.
>
> If we can make append extend into already allocated memory *and* not cause
> corruption, I don't see the downside.  And then there is one less array type
> to deal with (, create functions that handle, etc.).
>
> Besides, I think Andrei's LRU solution is better than mine (and pretty much
> in line with it).
>
> I still think having an Appender object or struct is a worthwhile thing, the
> "pre-allocate array then set length to zero" model is a hack at best.

But you still have the problem Andrei posted.  Code like this:

void func(int[] x)
{
     x ~= 3;
     x[0] = 42;
}

it'll compile and maybe run just fine, but there's no way to know if
the caller will see the 42 or not.   Unpredictable behavior like that
is breeding grounds for subtle bugs.

Perhaps that potential for bugs can be reduced by turning off the LRU
stuff in debug builds, and just making ~= reallocate always there.
Since, as you said, it's an optimization, makes sense to only turn it
on in release or maybe optimized builds.

To Andrei, do you really feel comfortable trying to explain this in
your book?  It seems like it will be difficult to explain that ~= is
sometimes efficient for appending but not necessarily if you're
working with a lot of arrays because it actually keeps this cache
under the hood that may or may not remember the actual underlying
capacity of the array you're appending to, so you should probably use
ArrayBuilder if you can, despite the optimization.

--bb