buffered input

Sat Feb 5 19:22:23 PST 2011

On 2/5/11 7:28 PM, Don wrote:
> spir wrote:
>> On 02/05/2011 10:44 PM, Nick Sabalausky wrote:
>>> "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org> wrote in message
>>> news:iijq99$1a5o$1 at digitalmars.com...
>>>> On 2/5/11 6:46 AM, spir wrote:
>>>>> On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
>>>>>> On a separate note, I think a good way of testing the design (and
>>>>>> end up
>>>>>> getting something useful anyway) would be to try to use it to
>>>>>> create a
>>>>>> range
>>>>>> that's automatically-buffered in a more traditional way. Ie, Given
>>>>>> any
>>>>>> input
>>>>>> range 'myRange', "buffered(myRange, 2048)" (or something like that)
>>>>>> would
>>>>>> wrap it in a new input range that automatically buffers the next 2048
>>>>>> elements (asynchronously?) whenever its internal buffer is
>>>>>> exhausted. Or
>>>>>> something like that. It's late and I'm tired and I can't think
>>>>>> anymore
>>>>>> ;)
>>>>>
>>>>> That's exactly what I'm expecting.
>>>>> Funnily enough, I was about to start a thread on the topic after
>>>>> reading
>>>>> related posts. My point was:
>>>>> "I'm not a specialist in efficiency (rather the opposite), I just know
>>>>> there is --theoretically-- relevant performance loss to expect from
>>>>> unbuffered input in various cases. Could we define a generic
>>>>> input-buffering primitive allowing people to benefit from others'
>>>>> competence? Just like Appender."
>>>>>
>>>>> Denis
>>>>
>>>> The buffered range interface as I defined it supports infinite
>>>> lookahead.
>>>> The interface mentioned by Nick has lookahead between 1 and 2048. So I
>>>> don't think my interface is appropriate for that.
>>>>
>>>> Infinite lookahead is a wonderful thing. Consider reading lines from a
>>>> file. Essentially what you need to do is to keep on reading blocks
>>>> of data
>>>> until you see \n (possibly followed by some more stuff). Then you offer
>>>> the client the line up to the \n. When the client wants a new line, you
>>>> combine the leftover data you already have with new stuff you read. On
>>>> occasion you need to move over leftovers, but if your internal
>>>> buffers are
>>>> large enough that is negligible (I actually tested this recently).
>>>>
>>>> Another example: consider dealing with line continuations in reading
>>>> CSV
>>>> files. Under certain conditions, you need to read one more line and
>>>> stitch
>>>> it with the existing one. This is easy with infinite lookahead, but
>>>> quite
>>>> elaborate with lookahead 1.
>>>>
>>>
>>> I think I can see how it might be worthwhile to discourage the
>>> traditional
>>> buffer interface I described in favor of the above. It wouldn't be as
>>> trivial to use as what people are used to, but I can see that it
>>> could avoid
>>> a lot of unnessisary copying, especially with other people's
>>> suggestion of
>>> allowing the user to provide their own buffer to be filled (and it seems
>>> easy enough to learn).
>>>
>>> But what about when you want a circular buffer? Ie, When you know a
>>> certain
>>> maximum lookahead is fine and you want to minimize memory usage and
>>> buffer-appends. Circular buffers don't do infinite lookahead so the
>>> interface maybe doesn't work as well. Plus you probably wouldn't want to
>>> provide an interface for slicing into the buffer, since the slice could
>>> straddle the wrap-around point which would require a new allocation (ie
>>> "return buffer[indexOfFront+sliceStart..$] ~
>>> buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that
>>> would just call for another type of range.
>>
>> Becomes too complicated, doesnt it?
>>
>> Denis
>
> Circular buffers don't seem like an 'optional' use case to me. Most real
> I/O works that way.
> I think if the abstraction can't handle it, the abstraction is a failure.

The abstraction does handle it implicitly, except that it doesn't fix 
the buffer size. If you ask for appendToFront() with large numbers 
without calling shiftFront() too, the size of the buffer will ultimately 
increase to accommodate the entire input. That's the infinite in 
infinite lookahead.

Most uses, however, stick with front/popFront (i.e. let the range choose 
the buffer size and handle circularity transparently) and on occasion 
call appendToFront()/shiftFront(). Whenever a part of the buffer has 
been released by calling shiftFront(), the implementation may use it as 
a circular buffer.

Circularity is all transparent, as I think it should be. But the power 
is there. Consider FIE* for contrast. The C API provides setbuf and 
setvbuf, but no way to otherwise take advantage of buffering - at all. 
This really hurts; you can only call fungetc() once and be guaranteed 
success - i.e. all FILE* offer L(1) capabilities no matter how big a 
buffer you set!

Andrei