protocol for using InputRanges

Tue Mar 25 16:22:18 PDT 2014

On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:
>> We want to appeal to the high performance coders. To maximize
>> performance, ranges should be optimized to the inner loop case, which is:
>>
>>      while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }
>
> Actually generic code often prefers:
>
> while (!r.empty) { ... r.front ...; r.popFront(); }
>
> That saves copying r.front if it returns by ref.

Yes, I agree. The idiom foreach (ref e; r) { ... } must also work.

>> Since front is guaranteed to succeed if !empty, this puts a requirement
>> on many ranges that they have a non-trivial constructor that 'primes the
>> pump'. Of course, priming may fail, and so construction may throw, which
>> is not good design.
>
> Again, I disagree with this assertion.

The throwing part, or the not good design part?

>> Next, priming requires a buffer in the range.
> Priming has nothing do to with the range being buffered. The entire design of
> ranges is a one-element buffer.
>> Buffers add size, making them slower to copy, and will often require
>> another field saying if the buffer has data in it or not, further
>> bumping the size.
> That's an argument for adding an unbuffered range abstraction.

I don't know how that would fit into the ecosystem.

In any case, with the protocol I suggest, the designer of the range is free to 
distribute the functionality into the three functions, however it makes best 
sense. With arbitrarily calling any in any order, then all three need to have 
some sort of signalling mechanism between them.

>> All this saves for the user is one call to empty for the entire
>> algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
>> to save O(1).
>
> I don't understand how that O() reasoning works.

I meant adding extra signalling with every iteration of the loop, rather than 
requiring the user to call empty before front.

>> B) I want to be able to call front multiple times in a row, and expect
>> to get the same value.
>
> Correct.
>
>> This can force the range to incorporate a one element buffer and a flag
>> indicating the status of that buffer.
>
> It may, but many ranges naturally work that way (e.g. arrays).

Sure, but there are far more range types than arrays.

>> The range instance gets bigger and
>> more expensive to copy, and the cost of manipulating the flag and the
>> buffer is added to every loop iteration. Note that the user of a range
>> can trivially use:
>>       auto e = r.front;
>>       ... using e multiple times ...
>> instead.
>
> That would pessimize code using arrays of large structs.

You're already requiring copying with the buffering requirement. And besides, if 
I was creating a range of expensive-to-copy objects, I would add a layer of 
indirection to cheapen it.

> The proposed protocol pessimizes arrays of large structs

Not really more than the existing protocol does (i.e. required buffering).

> and will trip the unwary if calling r.front again returns something else.

I'm not proposing that calling them wrongly would make things unsafe. But I 
don't think it's unreasonable to expect the protocol to be followed, just like 
file open/read/close.

> I don't think the proposal is good.

I don't like being in the position of when I need high performance code, I have 
to implement my own ranges & algorithms, or telling customers they need to do so.