Safe Cursors and Ranges
Steven Schveighoffer
schveiguy at yahoo.com
Thu Aug 26 07:46:40 PDT 2010
On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury at gmail.com> wrote:
> These thoughts were inspired by the recent thread, "Retrieving the
> traversed range". I am generally pretty sold on the range concept,
> but I think for some operations some sort of cursor is required.
> However, there are some unsafe cursor-related operations that are
> best avoided.
>
> However, as far as I can see, the unsafe operations revolve around
> synthesizing a new range out of two cursors. However, you can get
> a lot of the desired functionality associated with finding elements and
> splitting ranges *without* allowing the synthesis of of ranges from
> unrelated cursors. The key to doing this is to have each cursor be
> permanently bound to the underlying range which it was derived from.
>
> If you have this kind of cursor, you only really need it to support two
> operations to get a huge amount of benefit:
>
> Range before ();
>
> Range after ();
>
> The before () operation just returns a range of the appropriate type
> (Forward, Bidirectional, RandomAccess) of all the elements that come
> before the cursor is pointing to, while the after () operation returns a
> range of the appropriate type with all the elements after the cursor.
> Given this concept, a cursor really points at the boundary between
> elements of a range, if you will.
>
> If find and related operations return this kind of cursor, all of the
> operations involving splitting ranges become trivial. It's completely
> consistent for either before () or after () to be empty, and since you
> can't glue two of these cursors together, you are always guaranteed to
> get well-defined ranges out.
>
> While I think the before () and after () operations are the only ones
> which are strictly necessary for the desiderata I've seen so far, people
> could write their own generic algorithms if cursors supported more of
> the iterator operations, for moving forward (and backward, where
> appropriate) and inspecting the element immediately after the cursor (or
> before, where appropriate), and if ranges supported operations to return
> cursors from reasonable places (like from the front or back, or from a
> specific indexed elemnt for RandomAccess ranges).
>
> The key idea is that these cursors aren't a primitive part of a range;
> instead, they take a range and add a position *inside* the range.
> They're a perfect fit for all those "three-legged" operations out there
> because they actually have three legs.
That's exactly how cursors perform in dcollections. An excerpt from
dcollections' concepts document:
Cursors:
Cursors are special 1 or 0 element ranges. They implement the forward
range
interface. The benefit to using cursors is that they always refer to
exactly
one element. A normal range uses two marker elements, a begin and an end
element. Therefore, cursors are less susceptible to invalidation on
removal of
elements.
Cursors support these additional features:
- Use a cursor as a reference point when dealing with a container.
- Use as an end point when slicing a container. Note that some containers
only
support slicing with an arbitrary cursor and the beginning or end of the
container. Slicing is guaranteed to be a fast operation (lgn or better).
Determining cursor/range ownership:
All collections can identify whether a cursor/range belongs to them. Each
collection class has a 'belongs' method to determine ownership. The
ownership check is guaranteed to be a fast operation (O(lgN) or better).
-------------------
The 'three-legged' cursor as you propose was an early idea of mine too.
Because dcollections containers are classes however, there is an easy
point of ownership to establish -- the class reference. So a dcollections
cursor doesn't have to refer to its owner, but it can in order to fulfill
the requirement.
My view is that a cursor should be separate from a range, and a range
should either be able to tell if a cursor is part of it (@safe) or be told
that a cursor is a part of it (@system or @trusted). Inside a
three-legged algorithm which accepts a range as input, and generates a
cursor from the range for the third leg, it can force the range to assume
the cursor is a part of it, knowing that it's true. You can make the
second operation @safe by wrapping a cursor + range in a struct that
generates the cursor from the range (and therefore knows the cursor is a
part of the range).
For two examples of ranges that can tell if a cursor is a part of it, an
array can tell if a cursor is a part of it without any extra mechanisms
because it is an O(1) check for an interior pointer. A balanced tree can
also perform an O(lgn) check that a cursor is part of it (though it's
questionable whether a tree is a range).
-Steve
More information about the Digitalmars-d
mailing list