stream interfaces - with ranges

Thu May 17 10:54:00 PDT 2012

On Thu, 17 May 2012 11:46:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 5/17/12 9:02 AM, Steven Schveighoffer wrote:
>> Roughly speaking, not all the details are handled, but this makes a
>> feasible input range that will perform quite nicely for things like
>> std.algorithm.copy. I haven't checked, but copy should be able to handle
>> transferring a range of type T[] to an output range with element type T,
>> if it's not able to, it should be made to work.
>
> We can do this for copy, but if we need to specialize a lot of other  
> algorithms, maybe we didn't strike the best design.

Right.  The thing is, buffered streams are good as plain ranges for one  
thing -- forwarding data.  There probably aren't many algorithms in  
std.algorithm that are applicable.  And there is always the put idiom,  
Appender.put(buf) should work to accumulate all data into an array, which  
can then be used as a normal range.

One thing that worries me, if you did something like  
array(bufferedStream), it would accumulate N copies of the buffer  
reference, which wouldn't be what you want at all.  Of course, you could  
apply map to buffer to dup it.

>> 3. An ultimate goal of the i/o streaming package should be to be able to
>> do this:
>>
>> auto x = new XmlParser("<rootElement></rootElement>");
>>
>> or at least
>>
>> auto x = new XmlParser(buffered("<rootElement></rootElement>"));
>>
>> So I think arrays need to be able to be treated as a buffering streams.  
>> I
>> tried really hard to think of some way to make this work with my  
>> existing
>> system, but I don't think it will without unnecessary baggage, and  
>> losing
>> interoperability with existing range functions.
>
> I think we can create a generic abstraction buffered() that layers  
> buffering on top of an input range. If the input range has unbuffered  
> read capability, buffered() would use those. Otherwise, it would use  
> loops using empty, front, and popFront.

Right, this is different from my proposed buffer implementation, which  
puts a buffer on top of an unbuffered input *stream*.  But of course, we  
can define it for both, since it will be a compile-time interface.

>> Where does this leave us?
>>
>> 1. I think we need, as Andrei says, an unbuffered streaming abstraction.
>> I think I have this down pretty solidly in my current std.io.
>
> Great. What are the primitives?

See here:
https://github.com/schveiguy/phobos/blob/new-io2/std/io.d#L170

Through IODevice.  The BufferedStream type is going to be redone as a  
range.

>> 3. An implementation of the above definition hooked to the unbuffered
>> stream abstraction, to be utilized in more specific ranges. But by
>> itself, can be used as an input range or directly by code.
>
> Hah, I can't believe I wrote about the same thing above (and I swear I  
> didn't read yours).

Well, not quite :)  You wrote about it being supported by an underlying  
range, I need to have it supported by an underlying stream.  We probably  
need both.  But yeah, I think we are mostly on the same page here.

>> 4. Specialization ranges for each type of input you want (i.e. byLine,
>> byChunk, textStream).
>
> What is the purpose? To avoid unnecessary double buffering?

No, a specialization range *uses* a buffer range as its backing.  A buffer  
range I think is necessarily going to be a reference type (probably a  
class). The specialized range won't replace the buffer range, in other  
words.

Something like byLine is going to do the work of extracting lines from the  
buffer, it will reference the buffer data directly.  But it won't  
reimplement buffering.

>> 5. Full replacement option of File backend. File will start out with
>> C-supported calls, but any "promotion" to using a more D-like range type
>> will result in switching to a D-based stream using the above mechanisms.
>> Of course, all existing code should compile that does not try to assume
>> the File always has a valid FILE *.
>
> This will be tricky but probably doable.

Doing this will unify all the i/o packages together into one interface --  
File.  I think it's a bad story for D if you have 2 ways of doing i/o (or  
at least 2 ways of doing the *same thing* with i/o).

-Steve