We need to rethink remove in std.container

Tue Feb 22 00:43:25 PST 2011

On Tue, 22 Feb 2011 05:55:20 +0300, Jonathan M Davis <jmdavisProg at gmx.com>  
wrote:

> Okay, removing elements from a container sucks right now. You can do  
> stuff like
> removeAny (generally pretty useless IMHO) or removeFront just fine, but  
> removing
> an arbitrary range from a container just plain sucks.
>
> remove takes a range type which is the range type for the container that  
> it's
> on. That makes sense. It's obviously not going to be able to a take an  
> arbitrary
> range and remove that from itself. How would it know which elements in  
> that
> range corresponded to which elements in itself - especially when it  
> could be a
> range which skips elements or something similar? So, _that_ part makes  
> sense.
>
> But have you actually tried to get a range of the appropriate type to  
> remove
> from a container? It seems like almost ever function in std.range and
> std.algorithm returns a new range type, making them completely useless  
> for
> processing a range to be removed from a container.
>
> I was looking to remove a single element for a RedBlackTree. The best  
> function
> that I could think to get the proper range was findSplit. The middle  
> portion of
> the return value would be the portion that I was looking for. I'd take  
> that and
> pass it to RedBlackTree's remove. Wrong. It uses takeExactly in its
> implementation and the first two portions of the result of findSplit  
> aren't the
> right range type.
>
> So, what do I do? The _only_ thing that I can think of at the moment is  
> to use
> find to locate the beginning of the range that I want, take the length  
> of the
> range with walkLength and then use popBackN to pop of the elements I  
> don't want.
> e.g.
>
> ======
> import std.algorithm, std.container, std.conv,
>        std.range, std.stdio;
>
> void main()
> {
>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");
>
>     popBackN(found, walkLength(found) - 1);
>     assert(to!string(found) == "[5]");
>
>     rbt.remove(found);
>     assert(to!string(rbt[]) == "[0, 2, 12, 22, 59]");
> }
> ======
>
> That's disgusting. All that just to remove one element? And what if the  
> range
> isn't bidirectional (e.g. SList)? Well, then you have no popBack, and as  
> far as
> I can tell, you're screwed, since you can't use either take or  
> takeExactly,
> because both of them return new range types.
>
> In particular, the fact that you can't take a range and create a new one  
> of the
> same type from its first n elements is highly problematic. Maybe we need  
> to add a
> function to ForwardRange which returns a new range with the first n  
> elements (it
> certainly looks like that's the key element that's missing here). I  
> don't know
> if that would be reasonable, but the fact that you can't create a range  
> of the
> same type as the original range when taking just its first n elements  
> seems
> crippling in this situation.
>
> I don't know what the proper solution to this is, but the current  
> situation
> strikes me as untenable. I had to think through this problem for a while  
> before
> I came to a solution that came even close to working, let alone get one  
> that
> actually works. Removing elements from a container should _not_ be this  
> hard.
> The situation with remove _needs_ to be improved.
>
> - Jonathan M Davis

I believe remove operation is better expressed in terms of a Cursor  
(iterator in C++). It also doesn't make much sense (at least of me) for a  
find to return a subrange of the source range. While it does generalize  
well, it's not really what most people expect.

What I'd love to have instead is something like this:

int[] arr = [1, 2, 3, 4, 5];
auto cursor = arr.find(3); // either points to 3 or null

int[] before = arr[0..cursor];
int value = arr[cursor]; // for consistency, C++ uses *cursor
int[] after = arr[arr.advance(cursor)..$];

Note that typeof(cursor) could/should actually be int*, but I don't use  
arr[cursor+1..$] because cursor doesn't know about range bounds, and as  
such it can't advance on it's own.

For dynamic arrays:

V[K] dynamicArray;
V* cursor = key in dynamicArray;
dynamicArray.remove(key); // requires an additional lookup, which is  
sub-optimal
// dynamicArray.remove(cursor); // optimal but doesn't work atm. I think  
it should

I think it's also worth noting that "find" for trees makes even less sense  
(to me). You can't "slice" a tree arbitrarily, yet find tries to do that:

>     auto rbt = RedBlackTree!int([0, 2, 5, 12, 59, 22]);
>     assert(to!string(rbt[]) == "[0, 2, 5, 12, 22, 59]");
>
>     auto found = find(rbt[], 5);
>     assert(to!string(found) == "[5, 12, 22, 59]");

What is the type of "found"? It's clearly not a tree anymore. Don't know  
why, but it's still a range. How is it implemented? Without looking at the  
source I can tell it's implemented as an original tree + a cursor. And I  
believe the two are better be separated.

As a side note, in my own code I have 2 version of remove - remove and  
removeUnstable, which work differently. Simple example:

// swaps current element with last one and pops last
void removeUnstable(T)(ref T[] array, T* cursor)
{
     auto length = array.length;
     assert(length > 0);
     --length;

     T* last = array.ptr + length;
     assert(array.ptr <= cursor && cursor <= last);
     *cursor = *last;

     array.length = length;
}

// Moves elements
T* remove(T)(ref T[] array, T* cursor)
{
     auto length = array.length;
     assert(length > 0);
     --length;

     assert(array.ptr <= cursor && cursor <= array.ptr + length);

     auto numElementsToMove = length - (cursor - array.ptr);
     memmove(cursor, cursor + 1, numElementsToMove * T.sizeof);

     array.length = length;

     return cursor;
}