foreach vs. iterators

Fri Nov 3 16:51:50 PST 2006

Walter Bright wrote:
> Sean Kelly wrote:
> 
>> Forgive me for resurrecting a horribly battered equine, but I was 
>> thinking about this a bit tonight and I've decided that foreach, 
>> foreach_reverse, and array operations are not a sufficient general 
>> substitute for C++ iterators.
>>
>> For example, let's say I want to compute the set union of two sorted 
>> ranges, one of size M, the other of size N.  By iterating across the 
>> two ranges simultaneously is is easy to see that the complexity for 
>> the operation will be max(M,N).  In C++ this is easily accomplished 
>> using forward iterators, and the ranges can be anything from arrays to 
>> SQL result sets.
>>
>> In D however, there is no way to iterate across multiple ranges 
>> simultaneously using foreach, so opIndex must be used instead.  With 
>> containers that have a constant-time lookup cost this isn't a big 
>> deal, but what if these containers are binary trees?  The complexity 
>> for such an operation would be M(log(M)) + N(log(N)).
>>
>>  From this it seems clear that foreach is not sufficiently adaptable 
>> to meet the needs of all sequential algorithms and opIndex is not a 
>> reasonable substitute for all cases where foreach may not be used.  
>> What alternatives do we have?  So far, iterator variants are all I've 
>> been able to come up with.
> 
> You are correct in regards to looping across multiple aggregates 
> simultaneously.
> 
> On the other hand, C++ iterators aren't a general substitute for 
> opApply. For example, it is not at all easy to turn a recursive data 
> structure (i.e. binary tree) into a linearized iterator.

If I'm not missing the essence here, there are several kinds of 
iterators, and not every data structure should try to provide all of 
these. For example, an iterator into a tree structure might be 
breadth-first or depth-first, but one wouldn't expect a linearized (if I 
understand the term in this context at all) iterator to exist. 
Conversely, an array is hardly thinkable without a linear iterator.

What Stepanov explained to me at length (back when I was studying CS and 
there was not a single book published on the STL) was that the different 
containers all have their specific sets of "natural" iterators, and that 
only where it is convenient or at all implementable, other iterators may 
exist to them. And that it's not like you have a container and then need 
a certain iterator to it, but that you choose the container based on 
what kind of iterators you need in the first place.

Then it's about the iterator just being a way to reach every item 
exactly once. And if you don't seem to get them in the order you want, 
then you sort the items, say into a list. (Or rethink the problem.)

Back to foreach: it's syntactically a very convenient way to go through 
a trivial collection (i.e. an array or a list). But for non-trivial 
containers (or data structures), foreach simply is insufficient. You 
need an iterator. Iterators were invented for the very purpose of 
walking through a black-box container, if you will. I'd almost say that 
container-specific iterators are the only way to guarantee the lowest 
complexity walk-through of an arbitrary black-box container.

That said, it is of course customary to provide "wrong" iterators to 
containers (where at all possible, as conveniences), for example a 
random access iterator to a linked list. But then the user should stay 
aware that using such is only for the cases where the choice of 
container has already been evaluated to be optimal, and where one only 
occasionally needs to access some of the items in a specific way. In 
other words, where this need is seldom enough to not warrant a temporary 
copy (or linking) of the items into a more suitable container.

One could of course have foreach actually employ iterators. This gives 
the ease of coding with foreach combined with the flexiblity, decoupling 
and orthogonality offered by iterators.

But, as in the previous post, even this becomes cumbersome when we're 
iterating over more than one container simultaneously, especially when 
they're of different sizes or iterating in non-lockstep.

---

In general, the essence and elegance of the STL has not been too 
prevalent in D libraries or algorithms. With the current expressive 
power of D, one might expect to soon start seeing some of this.