Iterators recap

Fri Dec 15 06:18:38 PST 2006

Bill Baxter wrote:
> Sean Kelly wrote:
>> Bill Baxter wrote:
>>
>>> Sean Kelly wrote:
>>>
>>>> Bill Baxter wrote:
>>>>
>>>>> Sean Kelly wrote:
>>>>>
>>>>>> Bill Baxter wrote:
>>>>>>
>>>>>>>
>>>>>>> Too bad the iterator discussions fizzled out without anything 
>>>>>>> getting decided.
>>>>>>
>>>>>>
>>>>>> I think the design was pretty well settled when things ended.  The 
>>>>>> next step would be a sample implementation so folks could wrangle 
>>>>>> over syntax.  It's on my "to do" list, but I'm a bit short on free 
>>>>>> time at the moment.
>>>>>
>>>>>
>>>>> Huh.  So what is your understanding of what the consensus was?
>>>>
>>>>
>>>> Java-style, with random access iterators overloading array 
>>>> operations. I can't remember if there was any clear preference for 
>>>> the hasNext/getNext vs. the atEnd/getVal approach, but I tend to 
>>>> favor the latter.

Yes, this was basically the consensus.

>>>
>>>
>>> I like next/value/(ptr) myself.
>>> Makes for very succinct while loops.  ptr available where it makes 
>>> sense.
>>>
>>> while(iter.next()) {  writefln(iter.value); }
>>>
>>> while(iter.next()) {  *iter.ptr = 17; }
>>
>>
>> It's great for loops, but can be confusing when iterators stick around 
>> for a while.  Say I have some code that operates on the current value 
>> of an iterator.  With the next() approach, how do I know whether the 
>> iterator is valid?  Still, perhaps a hybrid approach is best.  Have 
>> next return true on success and offer an atEnd property as well.
> 
> Good point.  There's another problem related to iters sitting around a 
> while, too, that I hadn't considered.  If you have to call next() to get 
> the first item then you'll also need some way to query if the iterator 
> is in it's "pre-begin" state.  So the next/atend approach only really 
> works when there's no 'value' and 'next' is the only way to get the 
> value.  Ick.
> 
> So maybe a for-ish way is better than while-ish way:
> 
> for (; !iter.end; iter.next) {
>     writefln(iter.value);
> }
> 
> And have iterators start off valid (if not already at the end).

Yes, this may be better. This was basically the point the last iterator 
discussion reached. I would like to make this even more fun by throwing 
in bidirectional iterators. :)

Just to clarify my terminology here, a forward iterator needs three things:
1(R) a way to reference the data
2(T) a way to traverse the data
3(S) a way to signal the end of the data

If one wants an interface with less than 3 methods, the above behaviors 
need to be combined in some way. In 
news://news.digitalmars.com:119/eiqltc$22c1$1@digitaldaemon.com I 
briefly reviewd three different ways used by three different languages 
(Java, C#, Python).

Java combines (R+T) through next() and (S) through hasNext()

C# combines (T+S) through MoveNext() and (R) though Current()

Python combines (R+T+S) in next() (throwing an exception for S)

There are downsides with all approaches.

C# requires the iterator to initially be in an invalid state. 
Java/Python disallows peek()ing without traversal.
Python's exception requires more scaffolding at the use site.

Now to bidirectional iterators.

Separating R, T, S as three different methods, here are two ways a 
bidirectional iterator could be implemented:

The value-style:

BidirectionalIterator1(T) {
	T* begin;
	T* end;
	T* current;

	bool hasValue() { return cursor != end && cursor != (begin-1); }
	void moveNext() { current++; }
	void movePrevious() { current--; }

	T value() { return *current; }
	T value(T v) { return *current = v; }
}

Silly usage example:

auto i = someIterator;
while(i.hasValue) {
	if (i.value > x)
		i.moveNext;
	else
		i.movePrevious;
}

I like how succinct this turns out, but it has the disadvantage of 
internally needing two invalid values, one at the start and one at the 
end. Considering pointers internally, this means a pointer pointing 
before the start of an array, which is very bad. The other disadvantage 
is hasValue that has to check for both of these two cases.

One alternative is the Java style version. Where the pointer/cursor lies 
conceptually between the elements instead of at an element, giving:

BidirectionalIterator2(T) {
	T* begin;
	T* end;
	T* cursor;

	bool hasNext() { return cursor != end; }
	bool hasPrevious() { return cursor != begin; }
	void moveNext() { cursor++; }
	void movePrevious() { cursor--; }

	T next() { return *cursor; }
	T next(T v) { return *cursor = v; }

	T previous() { return *(cursor-1); }
	T previous(T v) { return *(cursor-1); }
}

auto i = someIterator;
while(i.hasNext) {
	if (i.next > 3)
		i.moveNext;
	else
		i.movePrevious;
}

The disadvantage is requiring two additional methods (hasPrevious and 
previous), but the advantage is that it doesn't need more than 1 illegal 
value (pointing 1 past the end, which is well supported).

Renaming next/previous into front/back or something may give more or 
less confusion. Probably the former.

Apart from different method names, are there other variants?

/Oskar