"Range invalidation" ?

Jonathan M Davis via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Aug 30 08:06:12 PDT 2017


On Wednesday, August 30, 2017 13:28:48 Mark via Digitalmars-d-learn wrote:
> C++ has the issue of iterator invalidation, where certain
> operations on a container while iterating on it may invalidate
> the iterator, in which case it is no longer safe to use the
> iterator.
>
> D has ranges, but presumably the same issue can arise in D. For
> instance, if I have a ForwardRange and I use the save primitive
> to keep a reference to some tail of it, then I can manipulate one
> of the ranges in a way that may leave the other in an undefined
> state. For instance, if the range is allocated on the heap and at
> some point I free its memory, then the range's copy is going to
> point to some garbage.
>
> Is there some documentation anywhere on:
> 1. Primitive operations, e.g. concatenation, that may invalidate
> a range.
> 2. Funtions in Phobos, operating on a range, e.g. (filter, sort,
> etc.), that may invalidate it?

It's entirely dependent on the implementation. _Most_ ranges cannot be
invalidated. They're usually either purely generative and aren't backed by
much in the way of memory (just the information needed to generate the next
value, which is likely held directly in a struct), or they're ultimately
backed by dynamic arrays, in which case, the only risk would be something
which had direct access to the array mutating its elements. Not even
mutating the length of the array would affect the range, because the range
itself would have a dynamic array which was a slice of the other, in which
case, the only affect that can be transferred from one to the other would be
the mutation of the elements that they both refer to.

The times when range invalidation comes into play is when it refers to
non-GC-allocated memory (e.g. malloce-ed memory or memory on the stack -
like if it has a dynamic array which is a slice of a static array) and when
the range is for an actual container whose elements may change. And whether
a range over a container would be invalidated by something like an element
being added or removed is entirely dependent on how the range and container
are implmented. Odds are that with something like Array, removing elements
will screw with the range, because like vector or a dynamic array, Array is
dealing with a contiguous block of memory, and unless Array is allocating a
new block of memory when it's altered so that the old is still valid for any
ranges (and it's obvously not doing that, because that would go against its
charter), then altering an Array is bound to screw up any ranges that refer
to it. Something like RedBlackTree may or may not be screwed up (it likely
depends on which elements are added or removed and which the range refers
to), but if something is removed from the middle of the range (or added to
it), then it would be at least insofar as it wouldn't refer to the same
elements anymore.

Really, unless a container goes to a bunch of extra work to keep track of
whether a range exists which refers to a particular element or range of
elements and ensure that any ranges continue to refer to the same elements
even when the container is mutated, mucking with a container is going to
muck with the range. And so in general, with a container, the only options
for ranges are either going to be to potentially allocate memory for the
range itself (thus copying a section of the container rather than just
referring to it) whenever the underlying container is mutated in a way that
would invalidate the range, or it's going to have to do something like is
done in Java and make it so that the range knows if it's been invalidated
and throws an Exception or Error if it's used after that. The invalidation
problem is just all around worse with ranges backed by containers than it is
for iterators backed by containers, because ranges refer to more elements,
and they ususally refer to a contiguous set of elements rather than the
elements individually.

Dynamic arrays avoid all of this because any memory that they shared is
never mucked with by mutating an individual dynamic array except in the case
where an element is mutated. Altering the length of one dynamic array does
not alter the length of memory of any other dynamic array, even if they're
slices of the same memory. But that also means that they end up reallocating
pretty easily if you do anything other than simply append to them, whereas
removing an element from a typical container does not result in a
reallocation.

In any case, it's ultimately going to come down to the implementation of the
container in question. IIRC, std.container is not at all clear about which
operations do and don't invalidate ranges, but in general, it's hard enough
to make it so that a range stays valid after a container is mutated that the
odds are that any range which continues to be used after its underlying
container has been mutated is not going to contain the same elements
anymore, and it might blow up in your face. So, unless the documentation is
explicit about it, it's probably just best to assume that a range over a
container is no longer valid after the container has been mutated.

Honestly, containers is the big place where ranges tend to be worse than
iterators, and we haven't done all that good a job with containers in
general for Phobos. Supposedly, Andrei is working on a new set of
containers, but they have yet to materialize. And all of the issues that
arise from ranges refering to more than one element and not being able to
figure out that a range refers to a container once it's wrapped in a another
range makes stuff like remove functions difficult. std.container solved that
problem by using take/Take so that remove takes Take!Range and is thus able
to know that the underlying range is from that container while also
understanding how the wrapper range works so that it only removes the
elements refered to by the wrapper range, but that doesn't scale. You can do
it for a few set of primitives, but even then it's worse than with
iterators. And invalidation is inherently worse for ranges without the
container playing games and potentially essentially allocating a new
container just for the range, which is unlikely to be cheap. I think that
the general sentiment has been that folks want D's containers to not have
range invalidation problems, but that's not necessarily a easy nut to crack
while being efficient.

- Jonathan M Davis



More information about the Digitalmars-d-learn mailing list