std.stream replacement

Tue Mar 5 11:12:58 PST 2013

05-Mar-2013 22:49, Steven Schveighoffer пишет:
> On Tue, 05 Mar 2013 11:43:59 -0500, Dmitry Olshansky
> <dmitry.olsh at gmail.com> wrote:
>
>
>> That's it.
>> C's iobuf stuff and locks around (f)getc are one reason for it being
>> slower. In D we need no stinkin' locks as stuff is TLS by default.
>>
>> Plus as far as I understand your std.io idea it was focused around
>> filling up user-provided buffers directly without obligatory double
>> buffering somewhere inside like C does.
>
> You are right about the locking, though shared streams like stdout will
> need to be locked (this is actually one of the more difficult parts to
> do, and I haven't done it yet.  Shared is a pain to work with, the
> current File struct cheats with casting, I think I will have to do
> something like that).

But at least these are already shared :) In fact, shared is meant to be 
a pain in the ass (but I agree it should get some more convenience).

What is a key point is that shared should have been the user's problem. 
Now writeln and its ilk are too darn common so some locking scheme got 
to be backed-in to amend the pain.

> File does a pretty good job of locking for an
> entire operation (i.e. an entire writeln/readf).

I just hope it doesn't call internally locking C functions after that...

> C iobuf I think tries to avoid double buffering for some things (e.g.
> gcc's getline), but std.io takes that to a new level.

Yeah, AFAIK it translates calls for say few megabytes of data to direct 
read/write OS syscalls. Hard to say how reliable their heuristics are.

> With std.io you have SAFE access directly to the buffer.  So instead of
> getline being "read directly into my buffer, or copy into my buffer",
> it's "make sure there is a complete line in the file buffer, then give
> me a slice to it".  What's great about this is, you don't need to hack
> phobos to get buffer access like you need to hack C's stream to get
> buffer access to create something like getline.  So many more
> possibilities exist.
>
> So things like parsing xml files need no double buffering at all, AND
> you don't even have to provide a buffer!

Slicing the internal buffer is real darn nice. Hard to stress it enough ;)

There is one thing I found a nice abstraction while helping out on D's 
lexer in D and I call it mark-slice range. An extension to forward range 
it seems.

It's all about buffering and defining a position in input such that you 
don't care for anything up to this point. This means that starting from 
thusly marked point stuff needs to be kept in buffer, everything prior 
to it could be discarded. The 2nd operation "slice" is getting a slice 
of some internal buffer from last mark to the current position.

Would be interesting to see how it correlates with buffered I/O in 
std.io, what you say so far fits the bill.

> Note that it is still possible to provide a buffer, in case that is what
> you want to do, and it will only copy any data already in the stream
> buffer.

So if I use my own buffers exclusively there is nothing to worry about 
(no copy this - copy that)?

> Everything else is read directly in (I have some heuristics to
> try and prevent tiny reads, so if you want to say read 4 bytes, it will
> first fill the stream buffer, then copy 4 bytes).

This seems a bit like C one iff it's a smart libc. What if instead you 
read more then requested into target buffer (if it fits)? You can tweak 
the definition of read to say "buffer no less then X bytes, the actual 
amount is returned" :)

And if one want the direct and dumb way of get me these 4 bytes - just 
let them provide fixed buffer of 4 bytes in total, then std.io can't 
read more then that. (Could be useful to bench OS I/O layer and such)
Another consequence is that std.io wouldn't need to allocate internal 
buffer eagerly for tiny reads (in case they actually show up).

-- 
Dmitry Olshansky