We need to rethink remove in std.container
Denis Koroskin
2korden at gmail.com
Tue Feb 22 00:43:25 PST 2011
On Tue, 22 Feb 2011 05:55:20 +0300, Jonathan M Davis <jmdavisProg at gmx.com>
wrote:
> Okay, removing elements from a container sucks right now. You can do
> stuff like
> removeAny (generally pretty useless IMHO) or removeFront just fine, but
> removing
> an arbitrary range from a container just plain sucks.
>
> remove takes a range type which is the range type for the container that
> it's
> on. That makes sense. It's obviously not going to be able to a take an
> arbitrary
> range and remove that from itself. How would it know which elements in
> that
> range corresponded to which elements in itself - especially when it
> could be a
> range which skips elements or something similar? So, _that_ part makes
> sense.
>
> But have you actually tried to get a range of the appropriate type to
> remove
> from a container? It seems like almost ever function in std.range and
> std.algorithm returns a new range type, making them completely useless
> for
> processing a range to be removed from a container.
>
> I was looking to remove a single element for a RedBlackTree. The best
> function
> that I could think to get the proper range was findSplit. The middle
> portion of
> the return value would be the portion that I was looking for. I'd take
> that and
> pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its
> implementation and the first two portions of the result of findSplit
> aren't the
> right range type.
>
> So, what do I do? The _only_ thing that I can think of at the moment is
> to use
> find to locate the beginning of the range that I want, take the length
> of the
> range with walkLength and then use popBackN to pop of the elements I
> don't want.
> e.g.
>
> ======
> import std.algorithm, std.container, std.conv,
> std.range, std.stdio;
>
> void main()
> {
> auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
> assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
> auto found = find(rbt[], 5);
> assert(to!string(found) == "[5, 12, 22, 59]");
>
> popBackN(found, walkLength(found) - 1);
> assert(to!string(found) == "[5]");
>
> rbt.remove(found);
> assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
> }
> ======
>
> That's disgusting. All that just to remove one element? And what if the
> range
> isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as
> far as
> I can tell, you're screwed, since you can't use either take or
> takeExactly,
> because both of them return new range types.
>
> In particular, the fact that you can't take a range and create a new one
> of the
> same type from its first n elements is highly problematic. Maybe we need
> to add a
> function to ForwardRange which returns a new range with the first n
> elements (it
> certainly looks like that's the key element that's missing here). I
> don't know
> if that would be reasonable, but the fact that you can't create a range
> of the
> same type as the original range when taking just its first n elements
> seems
> crippling in this situation.
>
> I don't know what the proper solution to this is, but the current
> situation
> strikes me as untenable. I had to think through this problem for a while
> before
> I came to a solution that came even close to working, let alone get one
> that
> actually works. Removing elements from a container should _not_ be this
> hard.
> The situation with remove _needs_ to be improved.
>
> - Jonathan M Davis
I believe remove operation is better expressed in terms of a Cursor
(iterator in C++). It also doesn't make much sense (at least of me) for a
find to return a subrange of the source range. While it does generalize
well, it's not really what most people expect.
What I'd love to have instead is something like this:
int[] arr = [1, 2, 3, 4, 5];
auto cursor = arr.find(3); // either points to 3 or null
int[] before = arr[0..cursor];
int value = arr[cursor]; // for consistency, C++ uses *cursor
int[] after = arr[arr.advance(cursor)..$];
Note that typeof(cursor) could/should actually be int*, but I don't use
arr[cursor+1..$] because cursor doesn't know about range bounds, and as
such it can't advance on it's own.
For dynamic arrays:
V[K] dynamicArray;
V* cursor = key in dynamicArray;
dynamicArray.remove(key); // requires an additional lookup, which is
sub-optimal
// dynamicArray.remove(cursor); // optimal but doesn't work atm. I think
it should
I think it's also worth noting that "find" for trees makes even less sense
(to me). You can't "slice" a tree arbitrarily, yet find tries to do that:
> auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
> assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
> auto found = find(rbt[], 5);
> assert(to!string(found) == "[5, 12, 22, 59]");
What is the type of "found"? It's clearly not a tree anymore. Don't know
why, but it's still a range. How is it implemented? Without looking at the
source I can tell it's implemented as an original tree + a cursor. And I
believe the two are better be separated.
As a side note, in my own code I have 2 version of remove - remove and
removeUnstable, which work differently. Simple example:
// swaps current element with last one and pops last
void removeUnstable(T)(ref T[] array, T* cursor)
{
auto length = array.length;
assert(length > 0);
--length;
T* last = array.ptr + length;
assert(array.ptr <= cursor && cursor <= last);
*cursor = *last;
array.length = length;
}
// Moves elements
T* remove(T)(ref T[] array, T* cursor)
{
auto length = array.length;
assert(length > 0);
--length;
assert(array.ptr <= cursor && cursor <= array.ptr + length);
auto numElementsToMove = length - (cursor - array.ptr);
memmove(cursor, cursor + 1, numElementsToMove * T.sizeof);
array.length = length;
return cursor;
}
More information about the Digitalmars-d
mailing list