We need to rethink remove in std.container

Wed Feb 23 05:33:03 PST 2011

On 2/22/11 8:41 AM, Steven Schveighoffer wrote:
> On Tue, 22 Feb 2011 09:23:00 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 2/22/11 7:58 AM, Steven Schveighoffer wrote:
>>> A cursor in dcollections is actually a zero or one element range. It can
>>> only reference exactly one element. Since it has no references to other
>>> elements, it is immune to operations that use the surrounding elements.
>>
>> At this exact point I had what would be either a good idea or a
>> brainfart.
>>
>> I've been thinking for a good while to define a "singleton range", a
>> range that has at most one element. I had the intuition that a
>> singleton range is a useful concept, but I felt it was a bit tenuous
>> to argue in its favor (mostly because one could easily simulate it
>> with repeat(x, 1)), so I never put it in std.range.
>>
>> The definition would go like this:
>>
>> auto singletonRange(T)(T element)
>> {
>> static struct Result
>> {
>> private T _element;
>> private bool _done;
>> @property bool empty() { return _done; }
>> @property auto front() { assert(!empty); return _element; }
>> void popFront() { assert(!empty); _done = true; }
>> auto save() { return this; }
>> }
>>
>> return Result(element, false);
>> }
>
> This is almost the same as a dcollections cursor. The only difference
> is, popFront in a cursor not only sets the '_done' member, but also
> moves to the next element. This distinction is necessary with node-based
> containers that have an end marker node. So the cursor that is returned
> from 'end()' (or from find where the element was not found, etc.) is an
> already-empty cursor.

I thought about this overnight and reached the same conclusion. The 
thing is, what's needed is not one element, but one element of a 
specific range. Here's what I have right now in range.d:

/**
Returns a range with at most one element; for example, $(D
takeOne([42, 43, 44])) returns a range consisting of the integer $(D
42). Calling $(D popFront()) off that range renders it empty.

Sometimes an empty range with the same signature is needed. For such
ranges use $(D takeNone!R()). For example:

----
auto s = takeOne([42, 43, 44]);
static assert(isRandomAccessRange!(typeof(s)));
assert(s.length == 1);
assert(!s.empty);
assert(s.front == 42);
s.front() = 43;
assert(s.front == 43);
assert(s.back == 43);
assert(s[0] == 43);
s.popFront();
assert(s.length == 0);
assert(s.empty);
s = takeNone!(int[])();
assert(s.length == 0);
assert(s.empty);
----

In effect $(D takeOne(r)) is equivalent to $(take(r, 1)) and $(D
takeNone(r)) is equivalent to $(D take(r, 0)), but in certain
interfaces it is important to know statically that the range may only
have at most one element.

The type returned by $(D takeOne) and $(D takeNone) is a random-access
range.
  */
auto takeOne(R)(R source) if (isInputRange!R)
{
     static struct Result
     {
         private R _source;
         bool _empty;
         @property bool empty() const { return _empty; }
         @property auto ref front() { assert(!empty); return 
_source.front; }
         void popFront() { assert(!empty); _empty = true; }
         void popBack() { assert(!empty); _empty = true; }
         auto save() { return Result(_source.save, empty); }
         @property auto ref back() { assert(!empty); return _source.front; }
         @property size_t length() const { return !empty; }
         auto ref opIndex(size_t n) { assert(n < length); return 
_source.front; }
         auto opSlice(size_t m, size_t n)
         {
             assert(m <= n && n < length);
             return n > m ? this : Result(_source, false);
         }
     }

     return Result(source, source.empty);
}

/// Ditto
auto takeNone(R)() if (isInputRange!R)
{
     return typeof(takeOne(R.init))(R.init, true);
}

unittest
{
     auto s = takeOne([42, 43, 44]);
     static assert(isRandomAccessRange!(typeof(s)));
     assert(s.length == 1);
     assert(!s.empty);
     assert(s.front == 42);
     s.front() = 43;
     assert(s.front == 43);
     assert(s.back == 43);
     assert(s[0] == 43);
     s.popFront();
     assert(s.length == 0);
     assert(s.empty);
     s = takeNone!(int[])();
     assert(s.length == 0);
     assert(s.empty);
}

> I hadn't thought of just setting the empty variable, but I think it
> wouldn't work properly. You could possibly say "a cursor that points to
> an element but is empty is actually pointing *after* the element." But
> then I'm not sure how to construct an end cursor. It's advantageous to
> point at the end of a container, where there is no true node to point
> at. Is there another way to think about this?

Yah, I think the right approach is to base the 0/1 range on an existing 
range, not on an existing value.

>> But now I realized that singleton ranges are your cursors, so I'll
>> definitely add them. And you also made me realize that a singleton
>> range is very different from other ranges in the same way 1 is
>> different from other numbers. Great.
>
> Yes, that distinction is really important, I'm glad you see that!
>
>> This may as well be The Great Unification that was needed to converge
>> std.container with dcollections in a harmonious whole.
>
> I'm not sure what to think of this ;) Is this a "oh what could have
> been" statement or a "let's reopen negotiations" statement?
>
> Note, I would love for dcollections to be mainstream Phobos, but I have
> accepted the current reality and am fine with the way things are now too.

Absolutely, and your collaboration on e.g. RedBlackTree and others is 
very much appreciated.

The way I was thinking is, there would a lot of convergence if 
dcollections wiped the notion of "cursor" and replaced it with takeOne 
throughout. One obvious objection to the range/cursor design is there 
are two abstractions to deal with instead of one. I had the intuition 
that ranges should be enough, but was not quite able to substantiate it 
fully until just about now. With take, takeExactly, and takeOne, it 
seems that indeed there is only need for the range interface, and that 
particular range types can replace cursors.

I realize that there are other major differences in design between 
dcollections and std.container, probably the largest one being the use 
of a hierarchy vs. free-floating types, so probably full unification is 
still not possible or quite remote. Nevertheless, keeping the 
communication channels open would make both libraries better and all of 
us a bit wiser.

Andrei