RFC on range design for D2

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Wed Sep 10 13:12:17 PDT 2008


Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Notice that "a range can't grow" is different from "a range can't be 
>> assigned from a larger range". In particular, a range operation can return 
>> a range larger than both input ranges. But not larger than their union 
>> :o).
> 
> Yes, so every time you add an element you have to update your forward range 
> from the 'all' range so it includes the new element at the end.  Every time 
> you remove an element, you have to update your reverse range from the 'all' 
> range so it excludes the element at the beginning.  Failure to do this 
> results in invalid ranges, which seems to me like a lot more work than 
> simply not doing anything  (in the case of an iterator).  The pitfalls of 
> using ranges for dynamically changing containers might outweigh the 
> advantages that they provide in certain cases.

No, this is incorrect. I don't "have to" at all. I could define the 
behavior of range as you mention, or I could render them undefined. 
Iterators invalidate anyway at the drop of a hat, so they're none the 
wiser. You can't transform a lack of an advantage into a disadvantage.

"Look at this pineapple. It's fresher than the other, and bigger too."

"No, it's about as big. That watermelon sucks."

>>> My belief is that ranges should be the form of input to algorithms, but 
>>> iterators should be provided for using containers as general data 
>>> structures.  Similar to how strings are represented by arrays/slices, but 
>>> iterators (pointers) exist if you need them.
>> If we agree it's better without iterators if not needed, we'd need a 
>> strong case to add them. Right now I have a strong case against them.
> 
> I don't need to worry about whether you have them or not, I can always 
> implement them on my own ;)  Really, range/iterator support doesn't require 
> direct support from the compiler (except for builtin arrays), and any 
> improvements made to the compiler to support ranges (such as reference 
> returns, etc) can be applied to iterators as well.
> 
> I think ranges are an excellent representation when a range of elements is 
> needed.  I think a cursor or iterator is an excellent representation when an 
> individual position is needed.
> 
>>> I'll probably move forward with this model in dcollections, I really like 
>>> the range idea, and in general the view on how ranges are akin to slices. 
>>> But I also like having access to iterators for other functions.
>> Which functions?
> 
> Functions which take or return a single position.  Such as 'erase the 
> element at this position' or 'find the position of element x'.

I agree. In fact I agreed in my original document, which I quote: 
``Coding with ranges also has disadvatages. Some algorithms work 
naturally with individual iterators in the "middle" of a range. A 
range-based implementation would have to maintain a redundant range 
spanning e.g. from the beginning of the container to that middle.''

However, I could meanigfully rewrite std.algorithm to work on ranges 
alone. The disadvantage does exist but is minor, For example, find does 
not return an iterator. It simply shrinks its input range until the 
element is found, or until it is empty. That way you can nicely use the 
result of find iteratively.

Range find(alias pred = "a == b", Range, E)(Range haystack, E needle)
{
     alias binaryFun!(pred) test;
     for (; !haystack.isEmpty; haystack.next)
     {
         if (test(haystack.first, needle)) break;
     }
     return haystack;
}

This is quite a few bites smaller than the previous version, which is 
now to be found in std.algorithm:

Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E 
needle)
{
     alias binaryFun!(pred) test;
     for (auto i = begin(haystack); i != end(haystack); ++i)
     {
         if (test(*i, needle)) return i;
     }
     return end(haystack);
}

It has two less variables, and VERY importantly, one less type to deal 
with. Arguments aired against primitive ranges systematically omit this 
important simplification they bring. When you don't weigh in the 
advantages, of course all there is to be seen are the disadvantages.

Better yet, when find does return, client code's in better shape because 
it doesn't need to compare the result against end(myrange). It can just 
test whether it's empty and be done with.

So a newcomer to D2 would have to have an understanding of containers 
and ranges. Containers own data. They offer various traversals to crawl 
them in the form of ranges. Ranges are generalized slices.

If iterators are thrown into the mix, things get more complex because 
iterators are a lower-level primitive, a generalized pointer. So the 
newcomer would have to ALSO understand iterators and deal with functions 
that require or return either. They'd have to learn how to pair 
iterators from ranges and how to extract iterators from ranges (more 
primitives). They'd also have to understand when it's better to hold on 
to a range (most of the time) versus a naked iterator (seldom and for a 
dubious convenience).

I /understand/ there are advantages to iterators. Just don't forget the 
cost when discussing them.

I am also sure that if I sat down long enough contemplating my navel I 
could come with more examples of iterators=good/ranges=bad. I am also 
sure that if I continued doing so I could also figure cases where a 
doubly linked-list iterator that "knows" whether it's atBegin or atEnd 
could come in handily. In fact how about this imaginary discussion 
between Stepanov and his imaginary colleague Tikhonov:

Stepanov: "Here, I have these cool iterators. I can express a great deal 
of stuff with them. It's amazing."

Tikhonov: "Ok, I have a doubly-linked list. An iterator is a node in the 
list, right?"

S: "Da. Those are bidirectional iterators because they can move in two 
directions in the list."

T: "Ok, my first element has prev == NULL and last element has next == 
NULL. Does your iterator know when it's at the begin and at the end of 
the list?"

S: "No. You see, you'd have to compare two iterators to figure that out. 
Just pass around an iterator to the beginning and end of the list 
fragment you're interested in, as you find fit."

T: "That sucks! I want an iterator to move up and down and tell me when 
it's at the beginning and at the end, without another stinkin' iterator."

S: "I implemented a great deal of algorithms without needing that. What 
algorithms of yours can't work with a comparison instead of atBegin and 
atEnd?"

T: "Well, I need to think of it. Maybe some video buffer or something."

S: "That works. You just save the iterator at the beginning of the 
sliding window. Then you compare against it."

T: "But I don't like that. I want you to define atBegin and atEnd so I 
don't need to carry an extra laggard!"

S: "Then what algorithms would fundamentally rest on that particular 
feature?"

T: "No idea. Here, let me look at my navel."

S: "While you do that, let me ask you this. Whatever those algorithms 
are, they can't work on a circular list, right?"

T: "I guess not. atBegin is never true for a circular list, unless, 
damn, you keep another iterator around to compare with."

S: "So those algorithms of yours would work on a doubly-linked list but 
not on a circular list. Are you sure you care for that distinction and 
that loss in generality?"

T: "Gee, look at the time. It's Friday evening! Let's go have a beer."


Andrei


More information about the Digitalmars-d-announce mailing list