D array expansion and non-deterministic re-allocation

Mon Nov 23 15:34:48 PST 2009

Steven Schveighoffer, el 23 de noviembre a las 15:18 me escribiste:
> On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llucax at gmail.com>
> wrote:
> 
> >Steven Schveighoffer, el 23 de noviembre a las 07:34 me escribiste:
> >>>Notice that you are using particular implementation detail (MRU
> >>>cache) to explain the semantics of D arrays. There is a very
> >>>important distinction between language specification and compiler
> >>>implementation. Andrei already had to go pretty deep into
> >>>implementation to describe arrays and slices: You can't define D
> >>>arrays without talking about shared buffers and memory allocation.
> >>>I don't think including the description of the MRU cache in the
> >>>language specification is the right solution. But I'm afraid an
> >>>abstract definition of "stomping" might turn out to be quite
> >>>non-trivial.
> >>
> >>I haven't yet read all the other posts, so someone else may already
> >>have pointed this out but...
> >>
> >>Having an MRU cache makes it so you *don't* have to explain its
> >>semantics (or stomping).  Currently there is a paragraph in the spec
> >>(complete with example) talking about how stomping can occur, so you
> >>can just remove that part.  There is no need to talk about the MRU
> >>cache when talking about arrays, that's an implementation detail.  I
> >>was pointing it out because you are already used to the current bad
> >>implementation :)  I wouldn't even bring up the MRU cache in the
> >>book or the spec.  You just say that you can append to an array and
> >>it may make a copy of the data if necessary.  It's just like
> >>realloc, except safer.
> >
> >The thing is, with realloc() is less likely that you forget that the data
> >can be copied because it returns the new pointer (that can be the same as
> >the original pointer). And even in this case, I saw a lot of bugs related
> >to realloc() misuse (and I made a couple myself).
> >
> >With slices is much worse.
> 
> realloc is worse.  If I have multiple aliases to the same data, then
> I realloc one of those, the others all can become dangling pointers
> if the runtime decides to move the data.

Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no
intention on following that discussion, I'm just saying that realloc() is
less error-prone than slices (and realloc() is error-prone already).

> You also cannot realloc data that's not malloc'd but you can append to
> a slice of non-heap data without issue.

How is that? AFAIK slices uses gc_realloc() to do the actual realloc, if
that's done in a piece of malloc'ed data it will fail. And even if it were
true, I don't really see this as a big source of bugs, I really never had
a bug because I tried to realloc() a non-heap piece of memory or appending
to a slice of non-GC-heap memory either.

> No matter what you do with slice appending in D, you cannot access
> dangling pointers unless you access the slice's ptr field.

Again, that's only because D is GCed, not because slices are great.

> Yes, you can run into trouble if you append to a slice and then
> change the original data in the slice, but that is a rare event, and
> it won't result in corrupting memory you didn't already have access
> to (at least, with the MRU cache).

I'm a little lost now. I don't know of what hypothetical D are you talking
about. I can't see how the MRU cache can provide any safety. The cache is
finite, and not all the slices will fit in it, so for those slices that
are not cached, I don't see how the cache can provide any safety.

Safety can be provided if a ~= b is defined to be semantically the same as
a = a ~ b, and leaving the MRU cache as an optimization. In that case we
agree slices are predictable and a little safer (because stomping is not
possible). But they are still error prone if you expect them to be a full
reference type. Being the only entity in D with such semantics, is
something one can forget very easily and introduce subtle bugs. In this
case, I really think providing ~= is a bad idea, it's just too error prone
and doesn't give you anything.

I still think the best is to just make slices immutable value types (you
can mutate the data they point to, you just can't modify slices; ptr and
length), and provide a proper dynamic array type in the library or
something.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
The biggest lie you can tell yourself is
When I get what I want I will be happy