std.collection - changing the collection while iterating

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Sun Jun 21 16:55:12 PDT 2015


On Sun, Jun 21, 2015 at 04:02:43PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> While I work on making std.allocator better, here's some food for
> thought regarding std.collection.
> 
> Consider a traditional container with reference semantics, Java-style.
> Regarding changing the collection (e.g. adding/removing elements)
> while iterating, we have the following possibilities:

Modifying a container while iterating over it generally will always have
counterintuitive behaviour, even if well-defined (in general, it's
ill-defined because it depends on implementation details), unless you
restrict it to a very narrow set of permissible modifications.

Consider, for instance, a simple linked-list of 5 elements:

	A -> B -> C -> D -> E

Suppose you're iterating over this list, and have a reference (or range,
or whatever, it's not important what) to, say, C:

	A -> B -> C -> D -> E
	          ^
	          currentPtr

Let's say we decide to delete C. Since this is a linked list, deleting C
will modify the .next pointer from B to point to D instead.

	A -> B -> D -> E

		C
		^
		currentPtr

Depending on the implementation, C.next may be reset to null, which
causes the iteration over the list to terminate prematurely, since
currentPtr.next is null, and it has no way to find where the
logically-next element D is. Therefore, the iteration sequence will be
A, B, C, instead of the intuitively-expected A, B, C, D, E.

To solve this problem, we may cache the successor of C in currentPtr, so
that we know to go to D next. However, this also breaks down if we
delete *both* C and D while currentPtr is at C. It will result in the
iteration sequence A, B, C, D instead of the expected A, B, C, E.


Linked-lists are relatively simple, so there may be a way to get
"consistent" (i.e., intuitive) behaviour from modifying it in the middle
of an iteration, but this breaks down when dealing with more complex
containers, say an unordered AA. Iterating over an AA, in theory, should
give you the elements in some implementation-defined order.  However, if
the AA is modified while in the middle of iteration, all sorts of
strange things may happen: deleting or inserting an element may cause an
AA rehash, which completely changes the sequence of iteration. The
current iteration may, as a result, miss some elements or repeat
elements that have already been processed earlier. Some newly-added
elements may be visited by the current iteration but others not, because
they hashed to an earlier address than the current bucket.

If you have multiple ranges iterating over the container at different
rates, the story becomes a whole lot more complex, and counterintuitive
behaviour is almost guaranteed.


> 1. Leave it undefined, like the STL does. Probably this is too
> extreme.

It may be extreme, but it also seems sanest, since it avoids treading in
the dangerous territory of strange and counterintuitive behaviours when
a container is modified in the middle of an iteration.


> 2. Throw an exception if a remove is attempted while ranges exist.
> This works but it's conservative - e.g. it throws even if those ranges
> are never to be used again etc.

This is too straitjacketed.


> 3. Allow the removal but throw from the ranges if there's any attempt
> to use a range following a remove.

This is also too extreme. If I'm iterating over a linked-list on the
50th element, there is no reason why deleting the first element of the
list should cause an exception when I later want to visit the 51st
element.


> 4. Make it work such that ranges continue to be valid post removal.
> Perhaps this is on the inefficient side.
[...]

As I pointed out above, this is not only inefficient, but also hard (if
not impossible) to define in any way that behaves consistently in every
scenario. Basically, if you allow arbitrary modification of a container
while iterating over it, you will always run into some corner case that
behaves counterintuitively, no matter what you do.

Thankfully, if we restrict the scope of container modifications to the
current element being iterated over, then there is a way to do things
consistently (I believe the container library written by a forum member
does this). However, this requires explicit container support, and may
or may not work if you have multiple ranges iterating over the container
simultaneously. It will probably introduce inefficiencies.

Arbitrary container modification and iteration simply don't mix.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying


More information about the Digitalmars-d mailing list