efficient input range buffering

Sun Oct 2 16:14:25 PDT 2011

On Sun, 02 Oct 2011 21:15:12 +0200, Timon Gehr <timon.gehr at gmx.ch> wrote:

> On 10/02/2011 04:12 PM, Martin Nowak wrote:
>> I've written a wrapper to promote input ranges to buffered forward  
>> ranges.
>> It allows to write range agnostic lexers/parsers with infinite  
>> lookahead.
>>
>> Buffering is done through a singly linked list of memory blocks that are
>> reference counted.
>> Each saved range holds a reference to all future blocks.
>> Blocks are recycled when being no longer used.
>>
>> https://gist.github.com/1257196
>>
>> There is a major issue with the somewhat broken
>> implicit-save-through-copy concept.
>> A lot of copies in function parameters, foreach loops etc. will also  
>> create
>> references and thus can be easily responsible for inefficiencies.
>>
>> martin
>
> I have implemented something similar (but it is not generic). It uses a  
> dynamic circular buffer on top of a single dynamic array. I believe it  
> is more efficient that way for usual parsing tasks. (and uses less lines  
> of code) The clients of the range have to explicitly state that they  
> want to go back to a certain state. It works in LIFO order, which is  
> sufficient for parsing with lookahead.
>
Which is about the same as I did where free memory blocks are recycled
and appended to the front. So to say a cyclic singly linked list.
Only you have two performance issues when using a ring buffer based on  
arrays.
  - The head reader needs to check if he can overwrite a location for every  
element.
  - When resizing the buffer you need to copy the already read data  
(possibly causing costly copy ctor invocations).

With block based reads you never need to copy read data and it also frees  
from
checking for overwrites until the block end is reached.

> auto a=r.pushAnchor(); // remember current state in a
> r.popFront(); // do some lookahead
> auto b=r.pushAnchor(); // remember current state in b
> r.popFront(); // do some lookahead
> r.popFront(); // ditto
> r.popAnchor(b); // go back to b
> r.popFront(); // do lookahead again
> r.popAnchor(a); // go back to a
>
> The buffer will dynamically be resized if a lot of lookahead is required  
> and be reused otherwise. It has minimal overhead and it uses the  
> execution stack to store the states (just indexes into the original  
> range). The final buffer size is less than twice the largest lookahead  
> required.
>
>
>
>
>
>
I also begin to think that the implicit save is not very useful and we  
should
rather use a paired restore.
Maybe the ForwardRange concept can be extended to
is(typeof(range.save) == typeof(range)) ||  
is(typeof(range.restore(range.save))).
But as Dmitry pointed out it is not always possible to allow duplication
so maybe this requires a new concept of restoreable InputRange, something  
in between
InputRange and ForwardRange.