[RFC] I/O and Buffer Range

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Jan 16 10:44:08 PST 2014


16-Jan-2014 19:55, Steven Schveighoffer пишет:
> On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
> <dmitry.olsh at gmail.com> wrote:
>> Then our goals are aligned. Be sure to take a peek at (if you haven't
>> already):
>> https://github.com/schveiguy/phobos/blob/new-io/std/io.d
>
> Yes, I'm gearing up to revisit that after a long D hiatus, and I came
> across this thread.
>
> At this point, I really really like the ideas that you have in this. It
> solves an issue that I struggled with, and my solution was quite clunky.
>
> I am thinking of this layout for streams/buffers:
>
> 1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
> I have pretty much written)
> 2. Buffer like you have, based on a struct, with specific primitives.
> It's job is to collect data from the underlying stream, and present it
> to consumers as a random-access buffer.

The only interesting thing I'd add here s that some buffer may work 
without underlying stream. Best examples are arrays and MM-files.

> 3. Filter that has access to transform the buffer data/copy it.
> 4. Ranges that use the buffer/filter to process/present the data.
>

Yes, yes and yes. I find it surprisingly good to see our vision seems to 
match. I was half-expecting you'd come along and destroy it all ;)

> The problem I struggled with is the presentation of UTF data of any
> format as char[] wchar[] or dchar[]. 2 things need to happen. First is
> that the data needs to be post-processed to perform any necessary byte
> swapping. The second is to transcode the data into the correct width.
>
> In this way, you can process UTF data of any type (I even have code to
> detect the encoding and automatically process it), and then use it in a
> way that makes sense for your code.
>
> My solution was to paste in a "processing" delegate into the class
> hierarchy of buffered streams that allowed one read/write access to the
> buffer. But it's clunky, and difficult to deal with in a generalized
> fashion.
>
> But the idea of using a buffer in between the stream and the range, and
> possibly bolting together multiple transformations in a clean way, makes
> this problem easy to solve, and I think it is closer to the vision
> Andrei/Walter have.

In essence a transcoding filter for UTF-16 would wrap a buffer of ubyte 
and itself present a buffer interface (but of wchar).

My own stuff currently deals only in ubyte and the limited decoding is 
represented by a "decode" function that takes a buffer of ubyte and 
decodes UTF-8. I think typed buffers/filters is the way to go.

>
> I also like the idea of "pinning" the data instead of my mechanism of
> using a delegate (which was similar but not as general). It also has
> better opportunities for optimization.
>
> Other ideas that came to me that buffer filters could represent:
>
> * compression/decompression
> * encryption
>
> I am going to study your code some more and see how I can update my code
> to use it. I still need to maintain the std.stdio.File interface, and
> Walter is insistent that the initial state of stdout/err/in must be
> synchronous with C (which kind of sucks, but I have plans on how to make
> it not be so bad).

I seriously not seeing how interfacing with C runtime could be fast enough.

> There is still a lot of work left to do, but I think one of the hard
> parts is done, namely dealing with UTF transcoding. The remaining sticky
> part is dealing with shared. But with structs, this should make things
> much easier.

I'm thinking a generic locking wrapper is possible along the lines of:

shared Locked!(GenericBuffer!char) stdin; //usage

struct Locked(T){
shared:
private:
	T _this;
	Mutex mut;
public:
	//forwarded methods
}

The wrapper will introduce a lock, and implement every method of wrapped 
struct roughly like this:
mut.lock();
scope(exit) mut.unlock();
(cast(T*)_this).method(args);

I'm sure it could be pretty automatic.

> One question, is there a reason a buffer type has to be a range at all?
> I can see where it's easy to make it a range, but I don't see
> higher-level code using the range primitives when dealing with chunks of
> a stream.

Lexers/parsers enjoy it - i.e. they work pretty much as ranges 
especially when skipping spaces and the like. As I said the main reason 
was: if it fits as range why not? After all it makes one-pass processing 
of data trivial as it rides on top of foreach:

foreach(octect; mybuffer)
{
	if(intersting(octect))
		do_cool_stuff();
}

Things like countUntil make perfect sense when called on buffer (e.g. to 
find matching sentinel).

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list