Iterators for D

Mon Nov 6 17:20:56 PST 2006

Walter Bright wrote:
> Kirk McDonald wrote:
>> Walter Bright wrote:
>>> Overloading * doesn't work with D. But, instead:
>> What are the arguments against that, again?
> 
> Because D has many cases of implicit dereferencing, I thought it would 
> cause more trouble than it's worth.
> 
>> I do not recall, and it's worth at least reviewing why we don't have 
>> opDeref and opDerefAssign.
>>
>> As you know, Walter, a major benefit of C++-style iterators is that 
>> they are semantically as close as possible to pointers. Would [] be 
>> changed to dereference pointers?
> 
> You've always been able to:
>     int* p;
>     p[1] = ...
> 

Wouldn't that be p[0] = ...?

Is that how you intend to use opIndex to "dereference" the iterator? 
That's /terrifying/. Although it is consistent with pointers, using [0] 
for everything is just asking for trouble with regards to 
non-random-access iterators. And though it's probably not totally 
meaningless to a random reader of the code, it comes pretty darned close.

>> I sure hope not. If we go this route, I would suggest adding these 
>> unary-* overloads.
>>
>>> Overload opIndex for rvalue access
>>> Overload opIndexAssign for lvalue access
>>>
>>
>> I think you mean opSlice and opSliceAssign.
> 
> No, I meant Index.
> 
>>> Overloading opIndex also will work for random access
>>>
>>> foreach loops will not be able to have a 'key' parameter.
>>
>> So, I would assume these iterators would work something like this:
>>
>> If a class provides .begin and .end, and these both return the same 
>> type, and that type provides the requisite iterator methods, it is 
>> foreach-able.
> 
> Yes, though they will return a value, not a type.
> 

s/these both return the same type/their return types are the same/

>> If a class provides .begin, .end, and opApply, one of the two 
>> iteration methods has to take precedence. I would hope it's opApply.
> 
> I was leaning towards the other way around.
> 

It is slightly easier and clearer to explicitly say:

foreach(e; t.begin) { }

than

foreach(e; &t.opApply) { }

opApply is an operator overload. .begin would just be a special method. 
I would expect the operator to take precedence.

>> For a type T, its associated iterator type should be available via 
>> T.iterator. This has to be standard.
> 
> It's not needed, as typeof(T.begin) will do it.
> 

Due to the way in which D's type inference from properties works, you'll 
need typeof(T.begin()). Otherwise, it will derive the type of the method 
itself. (Which isn't even a valid type, as such.)

>> An iterable type T MUST provide these methods:
>> I begin()
>> I end()
> 
> Yes.
> 
>> An iterator type I MUST provide the following overloads:
>> E opSlice()
> 
> No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[] 
would "dereference" the iterator). (A no-index opIndex doesn't work.) 
This of course is inconsistent with pointers, so saying i[0] is better 
on that count.

> 
>> I opPostInc()
> 
> Probably opAddAssign()
> 

Oh, there's another one:

int opEquals(I)

>> E is the type of an element in the collection.
> 
> 
>> These are enough to describe a read-only forward iterator, which is 
>> adequate for foreach. In other words, given a type T that meets the 
>> requirements outlined above, the following is adequate to iterate 
>> through it:
>>
>> foreach(e; t) {
>>     // ...
>> }
>>
>> 'e' would be of type E.
> 
> Yes.
> 
>> We run into a curious problem involving pre-increment, which uses 
>> opAddAssign. Classes for which random-access is problematic should not 
>> be required to provide this, which is essentially a random-access 
>> method. There are two options here:
>>
>> 1) Only require that iterators support opPostInc. (This further 
>> removes them from actual pointers.)
>>
>> 2) Introduce a convention whereby a non-random-access iterator throws 
>> an exception if any value other than 1 is passed to its opAddAssign 
>> overload.
> 
> I could go either way with that.
> 

Neither one is particularly elegant.

>> Extending these rules for bidirectional iterators and random-access 
>> iterators, as well as read-write iterators, is left as an exercise to 
>> the reader. (And not a very difficult one.)
>>
>> All of these strange edge-cases and differences between pointers and 
>> possible iterator types convince me that C++-STYLE ITERATORS ARE NOT 
>> FOR D. D, unlike C++, does not supply enough operator overloads for 
>> their semantics to be identical to those of pointers, which was the 
>> whole point of C++-style iterators in the first place.
> 
> I think it does provide enough, in fact, it will be less wacky than in 
> C++ (look at the wretched iterator/const_iterator dichotomy). It'll work 
> with core arrays, and will not need those typedefs.
> 

Without the ability to overload the dereference operator, any attempt to 
write a class that behaves like a pointer in D will be unwieldy, or at 
least ugly. Admittedly, this isn't much of an issue when just using 
foreach, but might be an issue with any kind of STL-like algorithms 
library. (Which is what you're inviting when using C++'s iterator 
semantics.)

> 
>> Instead, we should look to other languages for inspiration. I suggest 
>> Java and Python, whose iterator semantics are similar. Other posters 
>> have already made suggestions on this front. I would only suggest the 
>> "opIter" method for getting an iterator, and that the iterator class 
>> itself be available via T.iterator (whether it is a nested class or an 
>> alias shouldn't matter). Classes can provide methods for returning 
>> alternate iterators, as well. (A class providing opIter should be 
>> iterated over as easily as an iterator itself.)
> 
> 

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org