[RFC] I/O and Buffer Range

Thu Jan 16 14:28:31 PST 2014

17-Jan-2014 00:00, Steven Schveighoffer пишет:
> On Thu, 16 Jan 2014 13:44:08 -0500, Dmitry Olshansky
> <dmitry.olsh at gmail.com> wrote:
>
>> 16-Jan-2014 19:55, Steven Schveighoffer пишет:
>>> On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
>>> <dmitry.olsh at gmail.com> wrote:
[snip]

>> In essence a transcoding filter for UTF-16 would wrap a buffer of
>> ubyte and itself present a buffer interface (but of wchar).
>
> My intended interface allows you to specify the desired type per read.
> Think of the case of stdin, where the clients will be varied and written
> by many different people, and its interface is decided by Phobos.
>
> But a transcoding buffer may make some optimizations. For instance,
> reading a UTF32 file as utf-8 can re-use the same buffer, as no code
> unit uses more than 4 code points (did I get that right?).
>

The other way around :) 4 code units - 1 code point.

>>> I am going to study your code some more and see how I can update my code
>>> to use it. I still need to maintain the std.stdio.File interface, and
>>> Walter is insistent that the initial state of stdout/err/in must be
>>> synchronous with C (which kind of sucks, but I have plans on how to make
>>> it not be so bad).
>>
>> I seriously not seeing how interfacing with C runtime could be fast
>> enough.
>
> It's not. But an important stipulation in order for this to all be
> accepted is that it doesn't break existing code that expects things like
> printf and writef to interleave properly.
>
> However, I think we can have an opt-in scheme, and there are certain
> cases where we can proactively switch to a D-buffer scheme. For example,
> if you get a ByLine range, it expects to exhaust the data from stream,
> and may not properly work with C printf.
>
> The idea is that stdio.File can switch at runtime from FILE * to D
> streams as needed or directed.
>
>>> There is still a lot of work left to do, but I think one of the hard
>>> parts is done, namely dealing with UTF transcoding. The remaining sticky
>>> part is dealing with shared. But with structs, this should make things
>>> much easier.
>>
>> I'm thinking a generic locking wrapper is possible along the lines of:
>>
>> shared Locked!(GenericBuffer!char) stdin; //usage
>>
>> struct Locked(T){
>> shared:
>> private:
>>     T _this;
>>     Mutex mut;
>> public:
>>     //forwarded methods
>> }
>>
>> The wrapper will introduce a lock, and implement every method of
>> wrapped struct roughly like this:
>> mut.lock();
>> scope(exit) mut.unlock();
>> (cast(T*)_this).method(args);
>>
>> I'm sure it could be pretty automatic.
>
> This would be a key addition for ANY type in order to properly work with
> shared. BUT, I don't see how it works safely generically because you
> necessarily have to cast away shared in order to call the methods. You
> would have to limit this to only working on types it was intended for.

The requirement may be that it's pure or should I say "well-contained". 
In other words as long as it doesn't smuggle references somewhere else 
it should be fine.
That is to say it's 100% fool-proof, nor do I think that essentially 
simulating a synchronized class is a always a good thing to do...

> I've been expecting to have to do something like this, but not looking
> forward to it :(

>>> One question, is there a reason a buffer type has to be a range at all?
>>> I can see where it's easy to make it a range, but I don't see
>>> higher-level code using the range primitives when dealing with chunks of
>>> a stream.
>>
>> Lexers/parsers enjoy it - i.e. they work pretty much as ranges
>> especially when skipping spaces and the like. As I said the main
>> reason was: if it fits as range why not? After all it makes one-pass
>> processing of data trivial as it rides on top of foreach:
>>
>> foreach(octect; mybuffer)
>> {
>>     if(intersting(octect))
>>         do_cool_stuff();
>> }
>>
>> Things like countUntil make perfect sense when called on buffer (e.g.
>> to find matching sentinel).
>>
>
> I think I misstated my question. What I am curious about is why a type
> must be a forward range to pass isBuffer. Of course, if it makes sense
> for a buffer type to also be a range, it can certainly implement that
> interface as well. But I don't know that I would need those primitives
> in all cases. I don't have any specific use case for having a buffer
> that doesn't implement a range interface, but I am hesitant to
> necessarily couple the buffer interface to ranges just because we can't
> think of a counter-case :)

Convenient to work with does ring good to me. I simply see no need to 
reinvent std.algorithm on buffers especially the ones that just scan 
left-to-right.
Example would be calculating a checksum of a stream (say data comes from 
a pipe or socket i.e. buffered). It's a trivial application of 
std.algorithm.reduce and there no need to reinvent that wheel IMHO.

-- 
Dmitry Olshansky