stream == range ? [Sliding window]

Sun May 31 16:00:19 PDT 2015

On 01-Jun-2015 01:05, Dragos Carp wrote:
> On Sunday, 31 May 2015 at 17:50:41 UTC, Dmitry Olshansky wrote:
>> On 31-May-2015 18:58, Andrei Alexandrescu wrote:
>>> On 5/31/15 8:44 AM, Nick Sabalausky wrote:
>>>> On 05/30/2015 07:06 AM, short2cave wrote:
>>>>> Do the classic streams still make sense when we have Ranges ?
>>>>>
>>>>
>>>> I've been thinking the same thing lately. In fact, I'd been meaning to
>>>> make a post regarding that.
>>>>
>>>> Phobos's std.stream has been slated for an overhaul for awhile now, but
>>>> it seems to me that ranges are already 90% of the way to BEING a total
>>>> std.stream replacement:
>>>>
>>>> Output Streams: AFAICT, 100% covered by output ranges. Output streams
>>>> exist as a place for sticking arbitrary amounts of sequential data.
>>>> Output range's "put" does exactly that.
>>>>
>>>> Input Streams: Input ranges are very nearly a match for this. AFAICT,
>>>> The only thing missing here is the ability to "read" not just the one
>>>> "front" value, but to read the front N values as a chunk, without an
>>>> O(n) sequence of front/popFront. So we'd just need another "optional"
>>>> range characteristic: hasFrontN (or some such).
>>>
>>> Given that it seems Design by Introspection has been working well for us
>>> and we're continuing to enhance its use in Phobos, it seems to me that
>>> optional methods for ranges are the way to go.
>>>
>>> An optional method for any range is
>>>
>>> size_t bulkRead(T[] target);
>>>
>>> which fills as much as possible from target and returns the number of
>>> items copied.
>>>
>>> Another good candidate for optional methods is lookahead.
>>
>> Hardly. In fact, I've spent quite some time trying to figure this out.
>>
>> Doing bulk read to some array is the pushing burden of maintaining
>> some buffer on the user and adding the overhead of extra copy on
>> buffered streams. Not to mention that the more methods we put in range
>> API the more if/else forests we produce in our algorithms.
>
> LinearBuffer tries to address this problem in:
> https://github.com/D-Programming-Language/phobos/pull/2928
> This simply extends the existing std.array.Appender with popFrontN
> operations.
>
> The use case would be like this:
> 1. accumulate data in the LiniearBuffer by appending to the buffer
> 2. try to identify the next token
> 3. if a token is identified popFontN the length of the token
> 4. otherwise acumulate some more (go to 1.)
>

Yes, that's the pattern. Thoughts:
1. popFrontN is just confusing - this is not a range, let's not pretend 
it is.
2. As it stands it would require first copying into appender which is 
slow. The opposite direction is usually a better match - assume appender 
(or rather buffer) is filled to its capacity with data from the file and 
keeping this invariant true.

Otherwise it looks quite similar to the sliding window interface and 
that is great.

data --> window
popFrontN --> skip
put ~~> extend (probably should be just called 'more')

Actually I've dig up my original code for buffer, amended and integrated 
in one of Steven's Schveighoffer I/O package forks:

https://github.com/DmitryOlshansky/phobos/blob/new-io3/std/io/textbuf.d#L68
Now I recall it's not quite sliding window as it shrinks on reads...
Anyhow the same pattern goes with: data, discard and extend.

> I was also thinking about adding a new method:
> ptrdiff_t putFrom(ptrdiff_t delegate(T[]) read);
> meant to be used together with file read or socket receive functions. In
> this way the read/recv system functions can write directly into the
> input buffer without intermediate copies.

Yes, that is the missing piece in the current setup, something got to 
fill the buffer directly.

Parser or any other component down the line shouldn't concern itself 
with reading, just using some "more" primitive to auto-magically extend 
the window by at least X bytes. That would shrink-extend buffer  and 
load it behind the scenes.

Thus I think a user-facing primitive should be a composition of 3 
orthogonal pieces internally: underlying buffer, input handle and/or output.

Some ideas of how it comes together (except the mess with UTF endiannes):
https://github.com/DmitryOlshansky/phobos/blob/new-io3/std/io/textbuf.d#L183

-- 
Dmitry Olshansky