Safe Cursors and Ranges

Thu Aug 26 07:46:40 PDT 2010

On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury at gmail.com> wrote:

> These thoughts were inspired by the recent thread, "Retrieving the
> traversed range". I am generally pretty sold on the range concept,
> but I think for some operations some sort of cursor is required.
> However, there are some unsafe cursor-related operations that are
> best avoided.
>
> However, as far as I can see, the unsafe operations revolve around
> synthesizing a new range out of two cursors. However, you can get
> a lot of the desired functionality associated with finding elements and
> splitting ranges *without* allowing the synthesis of of ranges from
> unrelated cursors. The key to doing this is to have each cursor be  
> permanently bound to the underlying range which it was derived from.
>
> If you have this kind of cursor, you only really need it to support two  
> operations to get a huge amount of benefit:
>
>      Range before ();
>
>      Range after ();
>
> The before () operation just returns a range of the appropriate type  
> (Forward, Bidirectional, RandomAccess) of all the elements that come  
> before the cursor is pointing to, while the after () operation returns a  
> range of the appropriate type with all the elements after the cursor.  
> Given this concept, a cursor really points at the boundary between  
> elements of a range, if you will.
>
> If find and related operations return this kind of cursor, all of the  
> operations involving splitting ranges become trivial. It's completely  
> consistent for either before () or after () to be empty, and since you  
> can't glue two of these cursors together, you are always guaranteed to  
> get well-defined ranges out.
>
> While I think the before () and after () operations are the only ones
> which are strictly necessary for the desiderata I've seen so far, people
> could write their own generic algorithms if cursors supported more of
> the iterator operations, for moving forward (and backward, where  
> appropriate) and inspecting the element immediately after the cursor (or  
> before, where appropriate), and if ranges supported operations to return  
> cursors from reasonable places (like from the front or back, or from a  
> specific indexed elemnt for RandomAccess ranges).
>
> The key idea is that these cursors aren't a primitive part of a range;  
> instead, they take a range and add a position *inside* the range.  
> They're a perfect fit for all those "three-legged" operations out there  
> because they actually have three legs.

That's exactly how cursors perform in dcollections.  An excerpt from  
dcollections' concepts document:

Cursors:

Cursors are special 1 or 0 element ranges.  They implement the forward  
range
interface.  The benefit to using cursors is that they always refer to  
exactly
one element.  A normal range uses two marker elements, a begin and an end
element.  Therefore, cursors are less susceptible to invalidation on  
removal of
elements.

Cursors support these additional features:

- Use a cursor as a reference point when dealing with a container.
- Use as an end point when slicing a container.  Note that some containers  
only
   support slicing with an arbitrary cursor and the beginning or end of the
   container.  Slicing is guaranteed to be a fast operation (lgn or better).

Determining cursor/range ownership:

All collections can identify whether a cursor/range belongs to them.  Each  
collection class has a 'belongs' method to determine ownership.  The  
ownership check is guaranteed to be a fast operation (O(lgN) or better).

-------------------

The 'three-legged' cursor as you propose was an early idea of mine too.   
Because dcollections containers are classes however, there is an easy  
point of ownership to establish -- the class reference.  So a dcollections  
cursor doesn't have to refer to its owner, but it can in order to fulfill  
the requirement.

My view is that a cursor should be separate from a range, and a range  
should either be able to tell if a cursor is part of it (@safe) or be told  
that a cursor is a part of it (@system or @trusted).  Inside a  
three-legged algorithm which accepts a range as input, and generates a  
cursor from the range for the third leg, it can force the range to assume  
the cursor is a part of it, knowing that it's true.  You can make the  
second operation @safe by wrapping a cursor + range in a struct that  
generates the cursor from the range (and therefore knows the cursor is a  
part of the range).

For two examples of ranges that can tell if a cursor is a part of it, an  
array can tell if a cursor is a part of it without any extra mechanisms  
because it is an O(1) check for an interior pointer.  A balanced tree can  
also perform an O(lgn) check that a cursor is part of it (though it's  
questionable whether a tree is a range).

-Steve