Slices and Dynamic Arrays

Tue Jan 2 18:02:27 UTC 2018

On Tuesday, January 02, 2018 07:53:00 Steven Schveighoffer via Digitalmars-
d-learn wrote:
> On 1/1/18 12:18 AM, Jonathan M Davis wrote:
> > A big problem with the term slice though is that it means more than just
> > dynamic arrays - e.g. you slice a container to get a range over it, so
> > that range is a slice of the container even though no arrays are
> > involved at all. So, you really can't rely on the term slice meaning
> > dynamic array. Whether it does or not depends on the context. That
> > means that the fact that a number of folks have taken to using the term
> > slice to mean T[] like the D Slices article talks about tends to create
> > confusion when the context is not clear. IMHO, the D Slices article
> > should be updated to use the correct terminology, but I don't think
> > that the author is willing to do that.
> The problem with all of this is that dynamic array is a defined term
> *outside* of D [1]. And it doesn't mean exactly what D calls dynamic
> arrays.
>
> This is why it's confusing to outsiders, because they are expecting the
> same thing as a C++ std::vector, or a Java/.Net ArrayList, etc. And D
> "array slices" (the proper term IMO) are not the same.
>
> I'm willing to change the article to mention "Array slices" instead of
> just "slices", because that is a valid criticism. But I don't want to
> change it from slices to dynamic arrays, since the whole article is
> written around the subtle difference. I think the difference is important.
>
> -Steve
>
> [1] https://en.wikipedia.org/wiki/Dynamic_array

I completely agree that the distinction between the dynamic array and the
memory that backs it is critical to understanding the semantics when copying
arrays around, and anyone who thinks that the dynamic array itself directly
controls and owns the memory is certainly going to have some problems
understanding the full semantics, but I don't agree that it's required to
talk about the underlying GC-allocated memory buffer as being the dynamic
array for that to be understood - especially when the dynamic array can be
backed with other memory to begin with and still have the same semantics
(just with a capacity of 0 and thus guaranteed reallocation upon appending
or calling reserve). That distinction can be made just fine using the
official D terminology.

I also don't agree that the way that D uses the term dynamic array
contradicts the wikipedia article. What it describes is very much how D's
dynamic arrays behave. It's just that D's dynamic arrays are a bit special
in that they let the GC manage the memory instead of encapsulating it all in
the type itself, and copying them slices the memory instead of copying it
and thus causing an immediate reallocation like you would get with
std::vector or treating it as a full-on reference type like Java does. But
the semantics of what happens when you append to a D dynamic array are the
same as appending to something like std::vector save for the fact that you
might end up having the capacity filled sooner, because another dynamic
array referring to the same memory grew into that space, resulting in a
reallocation - but std::vector would have reallocated as soon as you copied
it. So, some of the finer details get a bit confusing if you expect a
dynamic array to behave _exactly_ like std::vector, but at a high level, the
semantics are basically the same.

On the basis that you seem to be arguing that D's dynamic arrays aren't
really dynamic arrays, I could see someone arguing that std::vector isn't a
dynamic array, because unlike ArrayList, it isn't a reference type and thus
appending to the copy doesn't append to the original - or the other way
around; ArrayList isn't a dynamic array, because appending to a "copy"
affects the original. The semantics of what happens when copying the array
around are secondary to what being a dynamic array actually means, much as
they obviously have a significant effect on how you write your code. The
critical bits are how the memory is continguous and how appending is
amortized to O(1). The semantics of copying clearly vary considerably
depending on the exact implementation even if you ignore what D has done.

I think that your article has been a great help, and the fact that you do a
good job of describing the distinction between T[] and the memory behind it
is critical. I just disagree with the terminology that you used when you did
it, and I honestly think that the terminology used has a negative effect on
understanding and dealing with dynamic arrays backed by non-GC-allocated
memory, because the result seems to be that folks think that there's
something different about them and how they behave (since they don't point
to a "dynamic array" as your article uses the term), when in reality,
there's really no difference in the semantics aside from the fact that their
capacity is guaranteed to be 0 and thus reallocation is guaranteed upon
appending or calling reserve, whereas for GC-backed dynamic arrays, capacity
could be other numbers, and immediate reallocation is not guaranteed any
more than it's guaranteed not to happen; it depends on the capacity, but you
can always know when a reallocation is going to occur based on the capacity,
regardless of what memory backs it.

Yes, if you're dealing with dynamic arrays backed by malloc-ed memory or a
static array, you're going to have to worry about the lifetime of those
dynamic arrays differently if you don't want @safety problems, and for
malloc-ed memory, you're going to have to keep track of a pointer to the
original memory so that it can be freed later, since the GC won't do it for
you, but all of the semantics of the dynamic array itself are the same.
regardless of what memory backs it. Now, maybe that's hard enough to
understand that lots of folks would be misunderstanding that and thinking
that GC-backed dynamic arrays are inherently different even if your article
used the terms in the official manner, but I'm convinced that the way that
it refers to dynamic arrays as being the memory buffer rather than T[]
itself makes that misunderstanding worse as the article as a whole clears up
other misunderstandings.

Regardless, given that the term slice is a rather overloaded term in D,
having the article consistently use the term array slice rather than simply
slice would be an improvement.

- Jonathan M Davis