protocol for using InputRanges

Tue Mar 25 14:42:07 PDT 2014

On 3/25/2014 1:56 PM, monarch_dodra wrote:
> http://dlang.org/phobos/std_range.html#isInputRange
>
> The semantics of an input range (not checkable during compilation) are assumed
> to be the following (r is an object of type R):
> r.empty returns false iff there is more data available in the range.
> r.front returns the current element in the range. It may return by value or by
> reference. Calling r.front is allowed only if calling r.empty has, or would
> have, returned false.
> r.popFront advances to the next element in the range. Calling r.popFront is
> allowed only if calling r.empty has, or would have, returned false.

I overlooked that. Thanks.

>> We want to appeal to the high performance coders. To maximize performance,
>> ranges should be optimized to the inner loop case, which is:
>>
>>     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }
>
> This makes the assumption that r.front is copy constructible at all.
> It also
> makes the assumption that you want to operate on a copy, rather than the actual
> element in the range.

It's a reasonable requirement. If your range has an issue with this, it can 
return a pointer to the element, and the element can be a struct with access 
functions. Then, the pointer will work as well as a copy.

> Finally, it means having to declare a local object: It merely means shifting the
> burden of caching from one context to another. If the object is large, chances
> are you are better off just calling front instead of making a copy. Especially
> if the loop is trivial.

This does come up as an issue, and is solvable by returning a pointer as I 
described. It's up to the designer of the range.

> If you want high performance, then arguably, just provide a O(1) front, and a
> O(1) empty.

I don't think the issues can be waved away so easily, or I wouldn't have brought 
this up.

> Also, certain ranges, such as "filter" *must* access the front of the previous
> range more than once.
> Unless you are suggesting we add a field for it to cache
> the result of the previous range?

This is putting the cost where it belongs - when needed on the user of a range, 
rather than on all ranges. It's the "pay only for what you need" idea rather 
than "pay regardless".

>> With the additional proviso that ranges are passed to algorithms by value, so
>> they should be cheap to copy. Cheap to copy usually means them being small.
> Unfortunately yes. That said, any range that does anything will have at least
> two fields, one of which is a slice, or comparable to in terms of size, so it's
> going to be big anyways. So you *are* better off passing by ref if you can
> regardless, unless your range is *really* trivial.

Many very useful ranges are trivial. Or at least they should be. An array, for 
example, is a trivial range.

>> A) I know that my range is not empty, so I can skip calling empty.
>>
>> Since front is guaranteed to succeed if !empty, this puts a requirement on
>> many ranges that they have a non-trivial constructor that 'primes the pump'.
>> Of course, priming may fail, and so construction may throw, which is not good
>> design.
>
> If the prime fails, then the range can simply be marked as empty. Then if you
> decide to skip calling empty anyways, it's your own fault.

Yes, one can add state flags to indicate failed construction, which I'll argue 
is an ugly design. After all, construction is supposed to construct an object or 
fail, not leave the 'constructed' object in a zombie state.

>> And lastly, it means that lazy ranges will be required to read the first
>> element, even if the range isn't then subsequently used, which defeats what
>> one would expect from a lazy range.
>
> I'm not yet convinced of adding special code for ranges that aren't used. I've
> heard of these kinds of ranges, but I've observed that when you declare one, it
> almost always ends up being used. I don't think we should waste efforts on this
> rare usecase.

It's not rare - it's the primary way ranges are used in C# Linq. Should we throw 
this entire category of use cases under the bus just to handle a convenience of 
not needing to call empty in some cases?

> Evaluating an element "on use" as opposed to "1 instruction before use" doesn't
> make much of a change in this context.

Except that it requires the use to start upon construction, which defeats any 
hope of separating construction of a pipeline from using it.

> I've found that if you are creative enough, you can usually design the range in
> such a way that it works efficiently, lazilly, and without flags.

That's not been my experience.

> I get where you are coming from, but it's simply not manageable in a generic
> fashion, if you want to be able to preserve all the power and the diversity of
> the ranges we have.
>
> The protocol you are suggesting would prevent us from doing a lot of the lovely
> things that ranges empowers us with.

Please show me such a case. Note that I've shown above that this power and 
diversity throws an entire use case category under the bus. Secondly, in my 
experiments with ranges, the power and diversity results in slower pipelines.