efficient input range buffering

Sun Oct 2 12:15:12 PDT 2011

On 10/02/2011 04:12 PM, Martin Nowak wrote:
> I've written a wrapper to promote input ranges to buffered forward ranges.
> It allows to write range agnostic lexers/parsers with infinite lookahead.
>
> Buffering is done through a singly linked list of memory blocks that are
> reference counted.
> Each saved range holds a reference to all future blocks.
> Blocks are recycled when being no longer used.
>
> https://gist.github.com/1257196
>
> There is a major issue with the somewhat broken
> implicit-save-through-copy concept.
> A lot of copies in function parameters, foreach loops etc. will also create
> references and thus can be easily responsible for inefficiencies.
>
> martin

I have implemented something similar (but it is not generic). It uses a 
dynamic circular buffer on top of a single dynamic array. I believe it 
is more efficient that way for usual parsing tasks. (and uses less lines 
of code) The clients of the range have to explicitly state that they 
want to go back to a certain state. It works in LIFO order, which is 
sufficient for parsing with lookahead.

auto a=r.pushAnchor(); // remember current state in a
r.popFront(); // do some lookahead
auto b=r.pushAnchor(); // remember current state in b
r.popFront(); // do some lookahead
r.popFront(); // ditto
r.popAnchor(b); // go back to b
r.popFront(); // do lookahead again
r.popAnchor(a); // go back to a

The buffer will dynamically be resized if a lot of lookahead is required 
and be reused otherwise. It has minimal overhead and it uses the 
execution stack to store the states (just indexes into the original 
range). The final buffer size is less than twice the largest lookahead 
required.