protocol for using InputRanges
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Tue Mar 25 14:29:18 PDT 2014
I thought I had only 1-2 comments, but I have a few more.
On 3/25/14, 1:15 PM, Walter Bright wrote:
> 1. the protocol is COMPLETELY undocumented and undefined.
s/COMPLETELY/loosely/
> Since ranges+algorithms are central to D, this is a very serious
> problem.
Agreed.
> We want to appeal to the high performance coders. To maximize
> performance, ranges should be optimized to the inner loop case, which is:
>
> while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }
Actually generic code often prefers:
while (!r.empty) { ... r.front ...; r.popFront(); }
That saves copying r.front if it returns by ref. A bunch of algos in std
do that.
> Since front is guaranteed to succeed if !empty, this puts a requirement
> on many ranges that they have a non-trivial constructor that 'primes the
> pump'. Of course, priming may fail, and so construction may throw, which
> is not good design.
Again, I disagree with this assertion.
> Next, priming requires a buffer in the range.
Priming has nothing do to with the range being buffered. The entire
design of ranges is a one-element buffer.
> Buffers add size, making them slower to copy, and will often require
> another field saying if the buffer has data in it or not, further
> bumping the size.
That's an argument for adding an unbuffered range abstraction.
> All this saves for the user is one call to empty for the entire
> algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
> to save O(1).
I don't understand how that O() reasoning works.
> B) I want to be able to call front multiple times in a row, and expect
> to get the same value.
Correct.
> This can force the range to incorporate a one element buffer and a flag
> indicating the status of that buffer.
It may, but many ranges naturally work that way (e.g. arrays).
> The range instance gets bigger and
> more expensive to copy, and the cost of manipulating the flag and the
> buffer is added to every loop iteration. Note that the user of a range
> can trivially use:
> auto e = r.front;
> ... using e multiple times ...
> instead.
That would pessimize code using arrays of large structs.
> The big problem with (A) and (B) is these costs become present in every
> range, despite being rarely needed and having simple alternatives. This
> is the wrong place to put cost and complexity. The user should be making
> these decisions based on his needs.
>
> Hence, I propose that the protocol for using input ranges be defined as:
>
> while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }
>
> This makes it possible to build pipelines that are firehoses with no
> kinks or constrictions in them. It optimizes for the inner loop case,
> not boundary cases.
The proposed protocol pessimizes arrays of large structs and will trip
the unwary if calling r.front again returns something else. I don't
think the proposal is good.
Andrei
More information about the Digitalmars-d
mailing list