We need to rethink remove in std.container

Lutger Blijdestijn lutger.blijdestijn at gmail.com
Tue Feb 22 05:53:56 PST 2011


Jonathan M Davis wrote:

> On Tuesday 22 February 2011 03:52:34 Lutger Blijdestijn wrote:
>> Jonathan M Davis wrote:
>> > On Tuesday 22 February 2011 01:04:48 Lutger Blijdestijn wrote:
>> >> Jonathan M Davis wrote:
>> >> > Okay, removing elements from a container sucks right now. You can do
>> >> > stuff like removeAny (generally pretty useless IMHO) or removeFront
>> >> > just fine, but removing an arbitrary range from a container just
>> >> > plain sucks.
>> >> > 
>> >> > remove takes a range type which is the range type for the container
>> >> > that it's on. That makes sense. It's obviously not going to be able
>> >> > to a take an arbitrary range and remove that from itself. How would
>> >> > it know which elements in that range corresponded to which elements
>> >> > in itself - especially when it could be a range which skips elements
>> >> > or something similar? So, _that_ part makes sense.
>> >> 
>> >> I don't understand why not. Given, as you said, that a lot of
>> >> functions return a new type it would make sense. My answer as to how
>> >> would be something like:
>> >> 
>> >> while range not empty
>> >> 
>> >>     container.remove range.front
>> >>     range.popFront
>> >> 
>> >> Is there a problem with that?
>> > 
>> > It's horribly inefficient for one. Also, I'm not sure that the range is
>> > valid any more after the remove, since it's front was removed. popFront
>> > may not work correctly.
>> 
>> It's a matter of range (in)validation right? I admit at this point I'm
>> not sure how it is supposed to work. remove + popFront is a bad idea if
>> the range is a view into the contained from where stuff is removed, but
>> in c++ this works for map: some_map.erase(i++) where i is an iterator. So
>> it depends on the type of container what is supported I think. But
>> perhaps this falls under Walter's category of useless wankery, since it
>> is easy to write yourself.
> 
> It depends on the implementation. It's possible that iterating an iterator
> or popping the front of a range where that iterator no longer points to a
> valid element actually works. It's also possible that, because it no
> longer points to a valid element, it _doesn't_ work. It wouldn't surprise
> me at all that some_map.erase(i++) isn't guaranteed to work in C++ at all
> but that you've just gotten lucky with the STL implementation that you've
> been using. I'd have to research it to be sure. But my guess would be that
> you've been lucky.

I'm fairly certain this is guaranteed, but specifically tied to std::map. 
The iterator returns the old value and is incremented before erasure, which 
doesn't invalidate iterators except the one that got removed.
 
>> >> > But have you actually tried to get a range of the appropriate type
>> >> > to remove from a container? It seems like almost ever function in
>> >> > std.range and std.algorithm returns a new range type, making them
>> >> > completely useless for processing a range to be removed from a
>> >> > container.
>> >> > 
>> >> > I was looking to remove a single element for a RedBlackTree. The
>> >> > best function that I could think to get the proper range was
>> >> > findSplit. The middle portion of the return value would be the
>> >> > portion that I was looking for. I'd take that and pass it to
>> >> > RedBlackTree's remove. Wrong. It uses takeExactly in its
>> >> > implementation and the first two portions of the result of findSplit
>> >> > aren't the right range type.
>> >> > 
>> >> > So, what do I do? The _only_ thing that I can think of at the moment
>> >> > is to use find to locate the beginning of the range that I want,
>> >> > take the length of the range with walkLength and then use popBackN
>> >> > to pop of the elements I don't want. e.g.
>> >> 
>> >> The table in the docs mention stableRemoveAny(v) which says "Same as
>> >> c.removeAny(v), but is guaranteed to not invalidate any iterators."
>> >> 
>> >> Though c.removeAny(v) itself is not listed in the table nor
>> >> implemented in RedBlackTree, isn't this the right function for the
>> >> job? (I take it v stands for a value of the container, rather than a
>> >> range). builtin aa's also implement this function.
>> > 
>> > removeAny doesn't solve the problem. For starters, the table appears to
>> > be wrong in that stableRemoveAny takes a value and removeAny doesn't.
>> > So, either stableRemoveAny is supposed to not take a value or there's
>> > supposed to be another version of removeAny which takes a value, and
>> > it's missing. So, I don't know what exactly is intended with
>> > stableRemoveAny.
>> > 
>> > Regardless of that, however, remove still needs to work. It's perfectly
>> > valid to want to remove a range of elements rather than just one.
>> > 
>> > - Jonathan M Davis
>> 
>> (stable)removeAny does not solve the problem of removing ranges, but
>> there must be *something* that solves the problem of removing one
>> element. You found one way to do it for RedBlackTree (i gave up), but if
>> I am not mistaken it doesn't have the right complexity and efficient
>> removal is a key property of this container.
> 
> If you could just do take or takeExactly on the range you got from find
> and pass it to remove, then it wouldn't matter whether you wanted to
> remove 1 element or many. The only difference is the number passed to
> take. The problem is that that doesn't work, because take returns the
> wrong range type. It has no way to get n elements from the front of the
> range and have them be in a range which is the same type as the orignal.
> So, I don't think that one element vs many is really the problem here.
> It's the inability to get the first n elements of a forward range and
> still retain the same range type.

There are two problems I think. What if I want to remove an array of strings 
from a bst of strings? That should be supported, even if only by removing 
them one by one yourself. If all we got was remove(Range) then this scenario 
isn't supported at all. Another problem is the efficiency. With a bst, using 
find + take + remove has to locate the element twice (once for find, once 
for remove), whereas remove(value) would only need to do this once. 

> Regardless, you're right. the solution I found most definitely does _not_
> have the right complexity. Because you have to use walkLength on the range
> that you get from find, it becomes O(n) instead of O(log n), and O(log n)
> is what it's _supposed_ to be for a red black tree. As I said, I don't
> believe that the current situation with remove is tenable, and the only
> solution that I see at the moment is to change forward range so that it
> has a function which returns the first n elements of that range with the
> same range type as that range.
> 
> - Jonathan M Davis


More information about the Digitalmars-d mailing list