protocol for using InputRanges

Tue Mar 25 14:29:18 PDT 2014

I thought I had only 1-2 comments, but I have a few more.

On 3/25/14, 1:15 PM, Walter Bright wrote:
> 1. the protocol is COMPLETELY undocumented and undefined.

s/COMPLETELY/loosely/

> Since ranges+algorithms are central to D, this is a very serious
> problem.

Agreed.

> We want to appeal to the high performance coders. To maximize
> performance, ranges should be optimized to the inner loop case, which is:
>
>      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

Actually generic code often prefers:

while (!r.empty) { ... r.front ...; r.popFront(); }

That saves copying r.front if it returns by ref. A bunch of algos in std 
do that.

> Since front is guaranteed to succeed if !empty, this puts a requirement
> on many ranges that they have a non-trivial constructor that 'primes the
> pump'. Of course, priming may fail, and so construction may throw, which
> is not good design.

Again, I disagree with this assertion.

> Next, priming requires a buffer in the range.

Priming has nothing do to with the range being buffered. The entire 
design of ranges is a one-element buffer.

> Buffers add size, making them slower to copy, and will often require
> another field saying if the buffer has data in it or not, further
> bumping the size.

That's an argument for adding an unbuffered range abstraction.

> All this saves for the user is one call to empty for the entire
> algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
> to save O(1).

I don't understand how that O() reasoning works.

> B) I want to be able to call front multiple times in a row, and expect
> to get the same value.

Correct.

> This can force the range to incorporate a one element buffer and a flag
> indicating the status of that buffer.

It may, but many ranges naturally work that way (e.g. arrays).

> The range instance gets bigger and
> more expensive to copy, and the cost of manipulating the flag and the
> buffer is added to every loop iteration. Note that the user of a range
> can trivially use:
>       auto e = r.front;
>       ... using e multiple times ...
> instead.

That would pessimize code using arrays of large structs.

> The big problem with (A) and (B) is these costs become present in every
> range, despite being rarely needed and having simple alternatives. This
> is the wrong place to put cost and complexity. The user should be making
> these decisions based on his needs.
>
> Hence, I propose that the protocol for using input ranges be defined as:
>
>      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }
>
> This makes it possible to build pipelines that are firehoses with no
> kinks or constrictions in them. It optimizes for the inner loop case,
> not boundary cases.

The proposed protocol pessimizes arrays of large structs and will trip 
the unwary if calling r.front again returns something else. I don't 
think the proposal is good.

Andrei