Container insertion and removal
Robert Jacques
sandford at jhu.edu
Sat Mar 6 23:10:09 PST 2010
On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
<schveiguy at yahoo.com> wrote:
> On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford at jhu.edu>
> wrote:
>
>> On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
>> <schveiguy at yahoo.com> wrote:
>>>
>>> How can softRemove not affect iterating ranges? What if the range is
>>> positioned on the element removed?
>>
>> It always affects ranges in so far as the range and container are
>> inconsistent, but that is a problem of softInsert as well. Removing an
>> element from an array doesn't invalidate the underlying range, since
>> the memory is still there. And so long as you're not trying to use free
>> lists, linked-lists, trees, etc. can be written so that the ranges
>> never enter an invalid state. If they happen to be pointing at the
>> removed node, they just end up stopping early.
>
> If my linked list range has two node pointers as the implementation, and
> you remove the end node, it's going to end later, not early.
A linked list maps to a forward range: so the range simply points to a
single internal node. If you're going to store a pointer to the end, then
you should be using a doubly linked list, not a singly linked list.
> Of course, you can get around this by just storing a current node and a
> length, but that may not be what the user is expecting. For example, if
> you did a range of a sorted tree from "a" to "f" and it stops at "n", I
> think that would be extremely unexpected.
By this logic, any and all forms of mutation, not just insertions and
deletions must not be allowed.
> Stopping early is invalidation also IMO. If your program logic depends
> on iterating over all the elements you originally intended to iterate,
> then we have big problems if they stop early.
Stopping early is the result of a logical error by the programmer. The
code itself, however, is completely valid.
>>> The only two containers that would support softInsert would be linked
>>> list and sorted map/set. Anything else might completely screw up the
>>> iteration. I don't see a lot of "generic" use for it.
>>
>> There's all the containers based upon linked-lists, etc like hashes,
>> stacks, queues and dequeues.
>
> Hashes may be rehashed when inserting, completely invalidating a range
> (possibly the end point will be before the starting point).
Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a
stale view)
> Yes, the others you mentioned will be valid. But I still don't see it
> being more useful than just using documentation to indicate insertion
> will not invalidate ranges. Perhaps I am wrong.
The difference is that algorithms can document in their template
constraints that they need a container with 'soft' properties.
>>> Another option is to use a "mutation" field that is checked every
>>> chance by the range. If it changes, then the range is invalidated.
>>
>> The mutation field would have to be a version number to support
>> multiple ranges, and given experience with lock-free algorithms which
>> use a 'tag' in a similar manner, this concept is bug prone and should
>> not be relied upon. It would be better to 'lock' the node or container
>> to topology changes, though this does slow things down and has no
>> conflict resolution: removing a locked node would have to throw an
>> exception.
>
> I was not thinking of multithreaded applications. I don't think it's
> worth making containers by default be multithreaded safe.
I wasn't thinking of multi-threaded containers. I was trying to point out
that version ids have failed in lock-free containers, where things are
happening on the order of a single atomic op or a context switch. Given
the time a range could go unused in standard code, versioning won't work.
> The mutation index has been used in Tango forever, and I think was in
> Doug Lea's original container implementations. I'm pretty sure it is
> sound in single-threaded uses.
No it's not. version tags + integer overflow = bug. Doug Lea knew about
the problem and but thought it would never happen in real code. And Bill
Gates thought no one will need more than 640k of ram. They both have been
proven wrong.
More information about the Digitalmars-d
mailing list