efficient input range buffering

Sun Oct 2 13:09:25 PDT 2011

On 02.10.2011 21:29, Martin Nowak wrote:
> On Sun, 02 Oct 2011 16:46:38 +0200, Dmitry Olshansky
> <dmitry.olsh at gmail.com> wrote:
>
>> On 02.10.2011 18:12, Martin Nowak wrote:
>>> I've written a wrapper to promote input ranges to buffered forward
>>> ranges.
>>> It allows to write range agnostic lexers/parsers with infinite
>>> lookahead.
>>>
>>> Buffering is done through a singly linked list of memory blocks that are
>>> reference counted.
>>> Each saved range holds a reference to all future blocks.
>>> Blocks are recycled when being no longer used.
>>>
>>> https://gist.github.com/1257196
>>>
>>
>> Like it, but one thing catches my eye - why use GC for blocks when the
>> whole thing is already refcounted? Straight malloc/free would be a
>> better fit. Certainly it may use an allocator when we have them.
>>
> I've added that, didn't before as it's always quite a lot of boilerplate
> to get this right.
>
>>> There is a major issue with the somewhat broken
>>> implicit-save-through-copy concept.
>>> A lot of copies in function parameters, foreach loops etc. will also
>>> create
>>> references and thus can be easily responsible for inefficiencies.
>>>
>>
>> While the concept of ForwardRange is quite elegant, I found myself
>> wondering on occasion that InputRange with restore could be way more
>> efficient. Consider:
>>
>> auto r = range();
>> auto save_it = r; ///at least full bitwise copy and/or refCounting
>> if(do_stuff(r))
>> ...
>> else
>> r = save_it;//another one
>> ...
>>
>> Compared to:
>>
>> auto save_pt = r.savepoint(); //different object, minimal overhead
>> if(do_stuff(r))
>> ...
>> else
>> r.restore(save_pt);//range knows how to do this efficiently
>> ...
>>
>
> Looks interesting and clearly has some advantages.
> I personally tend to alway write .save when I intend so and
> thought that way more phobos functions should take ranges by auto ref
> (i.e. actually advance them).

And .save is sort of the right thing to do. But I never bothered using it.

> But I recapture from an earlier discussion that most people liked the
> implicit save
> and some proposed to ditch save at all.
>

IMO implicit save is nice and well as long as it stays cheap e.g. 
arrays, plain in-memory stuff.
Once our hands are on efficiency things like buffered files it's all 
about being explicit enough.

> How would restore allow to actually duplicate the range at different
> positions.
> I think a pretty common use case for forward ranges is backtracking:

In your example it's duplication being used to do backtracking, it 
doesn't need to. There are use cases where you don't need a separate 
copy at all.

> auto start = range.save;
> popWhile!pred(range);
> doSomeThing(start, range);
>

For better or worse my idea is to ditch duplication ability completely. 
That's why I've been talking about InputRange specifically, it's sort of 
extension on it.
Now if we wanted to, it could be extend to forward ranges so they can do 
both: restore states and be duplicated.

> Will I need to be able to do this?

No the original intent is make new kind of InputRange that can go "back" 
to some state.
Thus:

> auto range2 = range; //same object
> range.restore(start); //resets to start
> doSomeThing(range, range2); // call with 2 identical objects at start, yikes!

-- 
Dmitry Olshansky