Container insertion and removal

Sun Mar 7 05:23:03 PST 2010

Robert Jacques Wrote:

> On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer  
> <schveiguy at yahoo.com> wrote:
> > On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>  
> > wrote:
> >
> >> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer  
> >> <schveiguy at yahoo.com> wrote:
> >>>
> >>> How can softRemove not affect iterating ranges?  What if the range is  
> >>> positioned on the element removed?
> >>
> >> It always affects ranges in so far as the range and container are  
> >> inconsistent, but that is a problem of softInsert as well. Removing an  
> >> element from an array doesn't invalidate the underlying range, since  
> >> the memory is still there. And so long as you're not trying to use free  
> >> lists, linked-lists, trees, etc. can be written so that the ranges  
> >> never enter an invalid state. If they happen to be pointing at the  
> >> removed node, they just end up stopping early.
> >
> > If my linked list range has two node pointers as the implementation, and  
> > you remove the end node, it's going to end later, not early.
> 
> A linked list maps to a forward range: so the range simply points to a  
> single internal node. If you're going to store a pointer to the end, then  
> you should be using a doubly linked list, not a singly linked list.

How do you know when to stop?  A range has a beginning and an ending, otherwise it's an iterator.  Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something.  Or are you asserting the only useful range for a linked list iterates the entire list?

Think of it as the equivalent of a slice of an array.

> 
> > Of course, you can get around this by just storing a current node and a  
> > length, but that may not be what the user is expecting.  For example, if  
> > you did a range of a sorted tree from "a" to "f" and it stops at "n", I  
> > think that would be extremely unexpected.
> 
> By this logic, any and all forms of mutation, not just insertions and  
> deletions must not be allowed.

Allowed, yes.  Not invalidating iterators, no.

> 
> > Stopping early is invalidation also IMO.  If your program logic depends  
> > on iterating over all the elements you originally intended to iterate,  
> > then we have big problems if they stop early.
> 
> Stopping early is the result of a logical error by the programmer. The  
> code itself, however, is completely valid.

I still fail to see the difference between "soft" operations and non-soft.  What does soft guarantee?  Give me a concrete definition, an example would help too.

> >>> The only two containers that would support softInsert would be linked  
> >>> list and sorted map/set.  Anything else might completely screw up the  
> >>> iteration.  I don't see a lot of "generic" use for it.
> >>
> >> There's all the containers based upon linked-lists, etc like hashes,  
> >> stacks, queues and dequeues.
> >
> > Hashes may be rehashed when inserting, completely invalidating a range  
> > (possibly the end point will be before the starting point).
> 
> Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a  
> stale view)

God no.  If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes?  I just rearrange them in a new bucket array.

> 
> > Yes, the others you mentioned will be valid.  But I still don't see it  
> > being more useful than just using documentation to indicate insertion  
> > will not invalidate ranges.  Perhaps I am wrong.
> 
> The difference is that algorithms can document in their template  
> constraints that they need a container with 'soft' properties.

What is the advantage?  Why would an algorithm require soft functions?  What is an example of such an algorithm?

> 
> >>> Another option is to use a "mutation" field that is checked every  
> >>> chance by the range.  If it changes, then the range is invalidated.
> >>
> >> The mutation field would have to be a version number to support  
> >> multiple ranges, and given experience with lock-free algorithms which  
> >> use a 'tag' in a similar manner, this concept is bug prone and should  
> >> not be relied upon. It would be better to 'lock' the node or container  
> >> to topology changes, though this does slow things down and has no  
> >> conflict resolution: removing a locked node would have to throw an  
> >> exception.
> >
> > I was not thinking of multithreaded applications.  I don't think it's  
> > worth making containers by default be multithreaded safe.
> 
> I wasn't thinking of multi-threaded containers. I was trying to point out  
> that version ids have failed in lock-free containers, where things are  
> happening on the order of a single atomic op or a context switch. Given  
> the time a range could go unused in standard code, versioning won't work.

Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid?  That's a valid, but very very weak point.

> 
> > The mutation index has been used in Tango forever, and I think was in  
> > Doug Lea's original container implementations.  I'm pretty sure it is  
> > sound in single-threaded uses.
> 
> No it's not. version tags + integer overflow = bug. Doug Lea knew about  
> the problem and but thought it would never happen in real code. And Bill  
> Gates thought no one will need more than 640k of ram. They both have been  
> proven wrong.

Overflow != bug.  Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.

-Steve