[RFC] I/O and Buffer Range

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jan 4 05:31:15 PST 2014


31-Dec-2013 22:46, Joseph Cassman пишет:
> On Tuesday, 31 December 2013 at 09:04:58 UTC, Dmitry Olshansky wrote:
>> 31-Dec-2013 05:53, Joseph Cassman пишет:
>>> On Sunday, 29 December 2013 at 22:02:57 UTC, Dmitry Olshansky wrote:

>> I'm thinking there might be a way to bridge the new range type with
>> ForwardRange but not directly as defined at the moment.
>>
>> A possibility I consider is to separate a Buffer object (not a range),
>> and let it be shared among views - light-weight buffer-ranges. Then if
>> we imagine that these light-weight buffer-ranges are working as marks
>> (i.e. they pin down the buffer) in the current proposal then they
>> could be forward ranges.

I've created a fork where I've implemented just that.
As a bonus I also tweaked stream primitives so it now works with pipes 
or whatever input stdin happens to be.

Links stay the same:
Docs: http://blackwhale.github.io/datapicked/dpick.buffer.traits.html
Code: 
https://github.com/blackwhale/datapicked/tree/fwd-buffer-range/dpick/buffer

The description has largely simplified and the primitive count reduced.

1. A buffer range is a forward range. It has reference semantics.
2. A copy produced by _save_ is an independent view of the underlying 
buffer (or window).
3. No bytes can be discarded that are seen in some existing view. Thus 
each reference pins its position in the buffer.
4. 3 new primitives are:
    Range slice(BufferRange r);
Returns a slice of a window between the current range position and r. It 
must be a random access range.

    ptrdiff_t tell(BufferRange r);
Returns a difference in positions in the window of current range and r. 
Note that unlike slice(r).length this can be both positive and negative.

    bool seek(ptrdiff_t ofs);
Reset buffer state to an offset from the current position. Return 
indicates success of the operation. It may fail if there is not enough 
data, or (if ofs is negative) that this portion of data was already 
discarded.

5. Lookahead and lookbehind are a extra primitives that were left intact 
for the moment. Where applicable a range may provide lookahead:

Range lookahead(); //as much as available in the window
Range lookahead(size_t n); // either n exactly or nothing if not

And lookbehind:

Range lookbehind(); //as much as available in the window
Range lookbehind(size_t n); //either n exactly or nothing if not

These should probably be tested as separate traits.

>> input-source <--> buffer range <--> parser/consumer
>>
>> Meaning that if we can mix and match parsers with buffer ranges, and
>> buffer ranges with input sources we had grown something powerful indeed.
>
> Being able to wrap an already-in-use range object with the buffer
> interface as you do in the sample code
> (https://github.com/blackwhale/datapicked/blob/master/dgrep.d) is good
> for composability. Also allows for existing functionality in
> std.algorithm to be reused as-is.

It was more about wrapping an array but it's got to integrate well with 
what we have. I could imagine a use case for buffering an input range.
Then I think a buffer range of anything other then bytes would be in order.

> I think the new range type could also be added directly to some new, or
> perhaps retrofitted into existing, code to add the new functionality
> without sacrificing performance. In that way the internal payload
> already used to get the data (say by the input range) could be reused
> without having to allocate new memory to support the buffer API.
>
> As one idea of using a buffer range from the start, a function template
> by(T) (where T is ubyte, char, wchar, or dchar) could be added to
> std.stdio.

IMHO C run-time I/O has no use in D. The amount of work spent on 
special-casing the non-locking primitives of each C run-time,
repeating legacy mistakes (like text mode, codepages and locales) and 
stumbling on portability problems (getc is a macro we can't have) would 
have been better spent elsewhere - designing our own I/O framework.

I've put together up something pretty simple and fast for buffer range 
directly on native I/O:
https://github.com/blackwhale/datapicked/blob/fwd-buffer-range/dpick/buffer/stream.d

It needs a bit better error messages then naked enforce, and a bit of 
tweaks to memory management. It does runs circles around existing 
std.stdio already.

> It would return a buffer range object providing more
> functionality than byChunk or byLine while adding access to the entire
> stream of data in a file in a contiguous and yet efficient manner.

Drop 'efficient' if we talk interfacing with C run-time. Otherwise, yes, 
absolutely.

> Seems
> to help with the issues faced in processing file data mentioned in
> previous comments in this thread.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list