Safe Cursors and Ranges

Steven Schveighoffer schveiguy at yahoo.com
Thu Aug 26 11:32:54 PDT 2010


On Thu, 26 Aug 2010 13:30:35 -0400, Pillsy <pillsbury at gmail.com> wrote:

> Steven Schveighoffer Wrote:
>
>> On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury at gmail.com>
>> wrote:
> [...]
>> > If you don't mind me asking, what made you give up on it?
>
>> A cursor that refers to one element is much more efficient to pass
>> around/copy.
>
> Ah, yes. If you just need the one element, I figure that you can use
> a range. In most of the cases I'm familiar with, ranges are not
> distinctly less lightweight than array slices, and in D one tends to
> blithely pass slices all over the place. Things could be different for
> iterating over trees---is that where you ran into trouble with the
> idea in dcollections?

I think there is a misunderstanding of how I went about defining  
dcollections cursors.  I never actually tried the tripod idea, I just  
posted it to the newsgroup as a possible solution for dcollections cursors.

Originally in dcollections 1, cursors were iterators -- allowing movement  
of the cursor as well as dereferencing.  This was way way back, before  
ranges were a prevalent concept and I was trying to write a better  
collection library for tango.

When negotiating with Andrei on allowing me to have cursors (when I hoped  
dcollections could be a part of phobos), I went through several different  
ideas to try and make cursors safe.  I ended up trying to define a simple  
reference type that didn't allow iterating.  That is, a reference to a  
single item that cannot move, thereby removing the unsafe nature of  
iterators.  I ran into the problem that the "end" cursor was pointing to  
an invalid element, so you still had to rely on the coder not referencing  
it, just like in C++.  I decided I'd add a bool to the cursor to let it  
know it was invalid, and the idea hit me all of a sudden -- that's just a  
0 or 1-element range depending on the bool.

So the current incarnation was born.  And I have to say, it's quite  
usable.  Once I defined it that way, the code almost wrote itself.

The issue with using ranges to refer to single elements is, it's difficult  
to keep a node-based range intact, for later use.  For example, if you  
want to refer to a single element in a tree, and you use a range to do  
that (assuming it's a range with a start and end node), things can  
happen.  More elements could be inserted between them, the end marker of  
your range could get removed, etc.  Then when you pass the range to the  
tree's remove function, you're giving it an invalid range, or removing  
more elements than you wanted.  Cursors are not susceptible to these  
problems, since they only ever refer to one element.

So there *is* value in having a special cursor type that only refers to  
one element vs. a range, regardless of the copy performance.

>> Because some cursors do not need the hard reference, it is trivial
>> to tell whether such cursors belong to a range.  For instance, it's
>> trivial to tell that a pointer is part of an array.  So a cursor for an
>> array would not need to refer the range it belongs to.
>
> Ah, I see. That would make things harder to use for the specific use
> case I'm thinking of, where one has a range one would like to split
> at a certain element (which you might have found any number of
> ways). Basically, you can just pass around the cursor, instead of
> needing the cursor and the range.
>
> This strikes me as likely to be very helpful for three-legged races^W
> cases.

Sure, but you can build such a type and its corresponding splitter  
function (which actually could be a member of the tripod) in terms of the  
low-level function which are more powerful because they accept two  
different entities -- the range and the cursor.

>> For instance, if I have an array, and a cursor in that array, it's  
>> trivial
>> for an allBefore operation to determine that the cursor is a member
>> of the array. Now, let's say we have two slices that contain the
>> same cursor, in  your scheme, in order to use one cursor against
>> the other range, you need  to recompose the cursor, which seems
>> unnecessarily awkward.
>
> The way I see it, two different slices can't contain the same cursor;
> a cursor into a slice will always be associated with that slice and no
> other. Maybe there are important use cases I'm missing by defining
> things this way, but I don't know what they might be.
> [...]

Not sure what the use cases would be, but defining the simplest thing  
possible and then building more complex concepts on top of them usually  
allows more possibilities.

>> For example, if I want to store a link list element cursor so I can
>> change it later, why do I also have to carry around the baggage of
>> the range it came from?
>
> You may not, but why not just hang on to the range it came from
> in the first place instead of using the cursor? What's in the link-list
> range besides a pointer to a node anyway?

The pointer to the possibly invalid end node.  See my above description.

> [...]
>> > What I'm thinking is that cursors aren't skinny ranges, they're
>> > ranges that are even fatter. You simply wouldn't use one
>> > where a range would suffice. Allowing questions about whether
>> > a range contains a cursor or not strikes me as just buying
>> > trouble.
>
>> But you don't always care if a cursor is a part of a range.  For
>> example, to change or refer to a specific element of a range,
>> why do you need to refer to the range also?
>
> You don't, but what do you lose by referring to the range also? A
> few words on the stack (sticking with the idea that ranges are value
> types)?

It becomes awkward when you are passing a range around but are saying  
"well, really, I'm only passing you a reference to the first element,  
ignore the rest of it".  You end up naming things like  
removeFirstElement(range r) instead of remove(cursor c).  This is from my  
experience with defining dcollections and a std.container redblacktree.

> [...]
>> I don't have a problem with defining such a three-legged type,
>> but can we call it something besides a cursor?
>
> Sure. How about Tripod? 8^)

I like it :)

-Steve


More information about the Digitalmars-d mailing list