Container insertion and removal
Steven Schveighoffer
schveiguy at yahoo.com
Sun Mar 7 05:23:03 PST 2010
Robert Jacques Wrote:
> On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
> <schveiguy at yahoo.com> wrote:
> > On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
> > wrote:
> >
> >> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
> >> <schveiguy at yahoo.com> wrote:
> >>>
> >>> How can softRemove not affect iterating ranges? What if the range is
> >>> positioned on the element removed?
> >>
> >> It always affects ranges in so far as the range and container are
> >> inconsistent, but that is a problem of softInsert as well. Removing an
> >> element from an array doesn't invalidate the underlying range, since
> >> the memory is still there. And so long as you're not trying to use free
> >> lists, linked-lists, trees, etc. can be written so that the ranges
> >> never enter an invalid state. If they happen to be pointing at the
> >> removed node, they just end up stopping early.
> >
> > If my linked list range has two node pointers as the implementation, and
> > you remove the end node, it's going to end later, not early.
>
> A linked list maps to a forward range: so the range simply points to a
> single internal node. If you're going to store a pointer to the end, then
> you should be using a doubly linked list, not a singly linked list.
How do you know when to stop? A range has a beginning and an ending, otherwise it's an iterator. Whether you store it via a pointer to the last (not-to-be-iterated) node, or via a length, or via a value to compare with for stopping, you have to use something. Or are you asserting the only useful range for a linked list iterates the entire list?
Think of it as the equivalent of a slice of an array.
>
> > Of course, you can get around this by just storing a current node and a
> > length, but that may not be what the user is expecting. For example, if
> > you did a range of a sorted tree from "a" to "f" and it stops at "n", I
> > think that would be extremely unexpected.
>
> By this logic, any and all forms of mutation, not just insertions and
> deletions must not be allowed.
Allowed, yes. Not invalidating iterators, no.
>
> > Stopping early is invalidation also IMO. If your program logic depends
> > on iterating over all the elements you originally intended to iterate,
> > then we have big problems if they stop early.
>
> Stopping early is the result of a logical error by the programmer. The
> code itself, however, is completely valid.
I still fail to see the difference between "soft" operations and non-soft. What does soft guarantee? Give me a concrete definition, an example would help too.
> >>> The only two containers that would support softInsert would be linked
> >>> list and sorted map/set. Anything else might completely screw up the
> >>> iteration. I don't see a lot of "generic" use for it.
> >>
> >> There's all the containers based upon linked-lists, etc like hashes,
> >> stacks, queues and dequeues.
> >
> > Hashes may be rehashed when inserting, completely invalidating a range
> > (possibly the end point will be before the starting point).
>
> Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
> stale view)
God no. If my hash collision solution is linked-list based (which it is in dcollections), why should I reallocate all those nodes? I just rearrange them in a new bucket array.
>
> > Yes, the others you mentioned will be valid. But I still don't see it
> > being more useful than just using documentation to indicate insertion
> > will not invalidate ranges. Perhaps I am wrong.
>
> The difference is that algorithms can document in their template
> constraints that they need a container with 'soft' properties.
What is the advantage? Why would an algorithm require soft functions? What is an example of such an algorithm?
>
> >>> Another option is to use a "mutation" field that is checked every
> >>> chance by the range. If it changes, then the range is invalidated.
> >>
> >> The mutation field would have to be a version number to support
> >> multiple ranges, and given experience with lock-free algorithms which
> >> use a 'tag' in a similar manner, this concept is bug prone and should
> >> not be relied upon. It would be better to 'lock' the node or container
> >> to topology changes, though this does slow things down and has no
> >> conflict resolution: removing a locked node would have to throw an
> >> exception.
> >
> > I was not thinking of multithreaded applications. I don't think it's
> > worth making containers by default be multithreaded safe.
>
> I wasn't thinking of multi-threaded containers. I was trying to point out
> that version ids have failed in lock-free containers, where things are
> happening on the order of a single atomic op or a context switch. Given
> the time a range could go unused in standard code, versioning won't work.
Are you trying to say that if you don't use your range for exactly 2^32 mutations, it could mistakenly think the range is still valid? That's a valid, but very very weak point.
>
> > The mutation index has been used in Tango forever, and I think was in
> > Doug Lea's original container implementations. I'm pretty sure it is
> > sound in single-threaded uses.
>
> No it's not. version tags + integer overflow = bug. Doug Lea knew about
> the problem and but thought it would never happen in real code. And Bill
> Gates thought no one will need more than 640k of ram. They both have been
> proven wrong.
Overflow != bug. Wrapping completely to the same value == bug, but is so unlikely, it's worth the possibility.
-Steve
More information about the Digitalmars-d
mailing list