efficient input range buffering

Martin Nowak dawg at dawgfoto.de
Sun Oct 2 18:18:56 PDT 2011


On Mon, 03 Oct 2011 03:12:00 +0200, Timon Gehr <timon.gehr at gmx.ch> wrote:

> On 10/03/2011 01:14 AM, Martin Nowak wrote:
>> On Sun, 02 Oct 2011 21:15:12 +0200, Timon Gehr <timon.gehr at gmx.ch>  
>> wrote:
>>
>>> On 10/02/2011 04:12 PM, Martin Nowak wrote:
>>>> I've written a wrapper to promote input ranges to buffered forward
>>>> ranges.
>>>> It allows to write range agnostic lexers/parsers with infinite
>>>> lookahead.
>>>>
>>>> Buffering is done through a singly linked list of memory blocks that  
>>>> are
>>>> reference counted.
>>>> Each saved range holds a reference to all future blocks.
>>>> Blocks are recycled when being no longer used.
>>>>
>>>> https://gist.github.com/1257196
>>>>
>>>> There is a major issue with the somewhat broken
>>>> implicit-save-through-copy concept.
>>>> A lot of copies in function parameters, foreach loops etc. will also
>>>> create
>>>> references and thus can be easily responsible for inefficiencies.
>>>>
>>>> martin
>>>
>>> I have implemented something similar (but it is not generic). It uses
>>> a dynamic circular buffer on top of a single dynamic array. I believe
>>> it is more efficient that way for usual parsing tasks. (and uses less
>>> lines of code) The clients of the range have to explicitly state that
>>> they want to go back to a certain state. It works in LIFO order, which
>>> is sufficient for parsing with lookahead.
>>>
>> Which is about the same as I did where free memory blocks are recycled
>> and appended to the front. So to say a cyclic singly linked list.
>> Only you have two performance issues when using a ring buffer based on
>> arrays.
>> - The head reader needs to check if he can overwrite a location for
>> every element.
>
> It does not. The only position that is important is that of the very  
> first anchor, therefore there is only one check per buffer refill. (the  
> range saves the very first anchor, as well as how many anchors are  
> around).
>
>> - When resizing the buffer you need to copy the already read data
>> (possibly causing costly copy ctor invocations).
>
> The buffer is always doubled in size, quickly reaching the maximal size  
> needed (I start with a quite large buffer, usually it is never resized).  
> There are no invocations of the copy ctor, because the data can be moved.
>
>>
>> With block based reads you never need to copy read data and it also
>> frees from
>> checking for overwrites until the block end is reached.
>
> I never need to copy data, and moving data only happens on degenerate  
> input.
>
>
I see, little more elaborate than I guessed from your description.
Is the code public?


More information about the Digitalmars-d mailing list