protocol for using InputRanges

monarch_dodra monarchdodra at gmail.com
Tue Mar 25 13:56:46 PDT 2014


On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:
> It's pretty clear that:
>
> 1. the protocol is COMPLETELY undocumented and undefined.

http://dlang.org/phobos/std_range.html#isInputRange

The semantics of an input range (not checkable during 
compilation) are assumed to be the following (r is an object of 
type R):
r.empty returns false iff there is more data available in the 
range.
r.front returns the current element in the range. It may return 
by value or by reference. Calling r.front is allowed only if 
calling r.empty has, or would have, returned false.
r.popFront advances to the next element in the range. Calling 
r.popFront is allowed only if calling r.empty has, or would have, 
returned false.

> We want to appeal to the high performance coders. To maximize 
> performance, ranges should be optimized to the inner loop case, 
> which is:
>
>     while (!r.empty) { auto e = r.front; ... e ...; 
> r.popFront(); }

This makes the assumption that r.front is copy constructible at 
all. It also makes the assumption that you want to operate on a 
copy, rather than the actual element in the range.

Finally, it means having to declare a local object: It merely 
means shifting the burden of caching from one context to another. 
If the object is large, chances are you are better off just 
calling front instead of making a copy. Especially if the loop is 
trivial.

If you want high performance, then arguably, just provide a O(1) 
front, and a O(1) empty.

Also, certain ranges, such as "filter" *must* access the front of 
the previous range more than once. Unless you are suggesting we 
add a field for it to cache the result of the previous range?

> With the additional proviso that ranges are passed to 
> algorithms by value, so they should be cheap to copy. Cheap to 
> copy usually means them being small.

Unfortunately yes. That said, any range that does anything will 
have at least two fields, one of which is a slice, or comparable 
to in terms of size, so it's going to be big anyways. So you 
*are* better off passing by ref if you can regardless, unless 
your range is *really* trivial.

I agree that range sizes can be a problem.

> A) I know that my range is not empty, so I can skip calling 
> empty.
>
> Since front is guaranteed to succeed if !empty, this puts a 
> requirement on many ranges that they have a non-trivial 
> constructor that 'primes the pump'. Of course, priming may 
> fail, and so construction may throw, which is not good design.

If the prime fails, then the range can simply be marked as empty. 
Then if you decide to skip calling empty anyways, it's your own 
fault.

> And lastly, it means that lazy ranges will be required to read 
> the first element, even if the range isn't then subsequently 
> used, which defeats what one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges that 
aren't used. I've heard of these kinds of ranges, but I've 
observed that when you declare one, it almost always ends up 
being used. I don't think we should waste efforts on this rare 
usecase.

As for "Lazy", in range terms, it mostly only means you calculate 
things element at once, as you go up the range chain. As opposed 
to processing the entire input data, one transformation at a time.

Evaluating an element "on use" as opposed to "1 instruction 
before use" doesn't make much of a change in this context.

> All this saves for the user is one call to empty for the entire 
> algorithm, at a cost incurred with every iteration. I.e. it 
> selects O(n) to save O(1).

If code that was actually meant to *do* something was in 
popFront() to begin with, then there'd be no extra overhead.

I've found that if you are creative enough, you can usually 
design the range in such a way that it works efficiently, 
lazilly, and without flags.

> Hence, I propose that the protocol for using input ranges be 
> defined as:
>
>     while (!r.empty) { auto e = r.front; ... e ...; 
> r.popFront(); }
>
> This makes it possible to build pipelines that are firehoses 
> with no kinks or constrictions in them. It optimizes for the 
> inner loop case, not boundary cases.

I get where you are coming from, but it's simply not manageable 
in a generic fashion, if you want to be able to preserve all the 
power and the diversity of the ranges we have.

The protocol you are suggesting would prevent us from doing a lot 
of the lovely things that ranges empowers us with.


More information about the Digitalmars-d mailing list