std.stream replacement

Thu Mar 7 04:42:24 PST 2013

On Wed, 06 Mar 2013 20:15:31 -0500, BLM768 <blm768 at gmail.com> wrote:

> On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
>> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 at gmail.com> wrote:
>>>
>>> Ranges aren't necessarily higher- or lower-level than streams; they're  
>>> completely orthogonal ways of looking at a data source. It's  
>>> completely possible to build a stream interface on top of a range of  
>>> characters, which is what I was suggesting. In that situation, the  
>>> range is at a lower level of abstraction than the stream is.
>>
>> I think you misunderstand.  Ranges absolutely can be a source for  
>> streams, especially if they are arrays.  The point is that the range  
>> *interface* doesn't make a good stream interface.  So we need to invent  
>> new methods to access streams.
>
> Although I probably didn't communicate it very well, my idea was that  
> since we already have functions like std.conv.parse that essentially  
> provide parts of a stream interface on top of ranges, the most  
> convenient way to implement a stream might be to build it on top of a  
> range interface so no code duplication is needed.

My point is, we should not build streams from ranges.  We have to  
establish terminology here.  A range is an API which provides a way to  
iterate over each element in a source using the methods front, popFront,  
and empty.

A basic stream provides a single function: read.  This function reads N  
bytes into an array, and advances the stream position.  Not a range, an  
array.  That is the basic building block that the OS gives us.  You can  
make read out of front, popFront, and empty, but it's going to be horribly  
low-performing, and I see no benefit to have read sit alongside the range  
primitives.

On top of that, we provide a buffered stream which manages the array the  
lower-level stream outputs, and allows access to data a chunk at a time.   
What defines that chunk is application-specific.

At a higher level is where ranges and streams meet.  front can provide  
access to a chunk, popFront can move on to the next chunk, and empty maps  
to EOF (last read returned 0 bytes).  That is a great mapping, and I  
expect it will be the preferred interface.  What I want to provide with  
std.io is an easy way to build ranges on top of streams by defining a  
mechanism to build the chunk.

But to say that streams are ranges at heart is incorrect.  Streams need  
the read feature, they don't need range features.

Now, if you want to shoehorn a range into a stream, I certainly can see  
how it will be possible.  Extremely slow, but possible.  That should be  
the last resort.  It shouldn't be the foundation.

There is the temptation to say "hey, arrays are ranges, and arrays make  
good stream sources!  Why can't all ranges make good stream sources?"  But  
arrays are good stream sources NOT because they are ranges, but because  
they are arrays.  Reading an array into an array is a noop.

>
>>> Create a range operation like "r.takeArray(n)". You can optimize it to  
>>> take a slice of the buffer when possible.
>>
>> This is not a good idea.  We want streams to be high performance.   
>> Accepting any range, such as a dchar range that outputs one dchar at a  
>> time, is not going to be high performance.
>
> If the function is optimized, it can essentially bypass the range layer  
> and operate directly on the buffer while using the same interface it  
> would use if it were operating on the range. As I understand it, some of  
> the operations in Phobos do that as well when given arrays.

This is the wrong track to take.

There have been quite a few people in the D community that have advocated  
for the syntax:

int[] arr;

auto p = 5 in arr;

Just like AAs.  It looks great!  Why shouldn't we have a way to search for  
data with such a concise interface?  The problem is then that diminishes  
the value of 'in'.  For AAs, this lookup is O(1) amortized, For an array,  
it's O(n).  This means any time a coder sees x in y, he has to consider  
whether that is a "slow lookup" or a "quick lookup".  Not only that, but  
generic code that uses the in operation has to insert caveats "this  
function is O(n) if T is an array, otherwise it's O(1)".  The situation is  
not something we want.

But if you still want to find 5 in arr, there is the not-as-nice, but  
certainly reasonable looking:

auto p = arr.find(5).ptr;

My point is, we don't want any range to substitute for a stream.  I think  
it might be worth considering accepting random-access ranges, or  
slice-assignable ranges to be stream sources, but not just any range.  We  
could provide a "RangeStream" type which shoehorns any range into a  
stream, but I'd want it tucked in some shadowy corner of Phobos, not to be  
used except in emergencies when nothing else will do.  It should be  
discouraged.

>>> Range operations like std.conv.parse implicitly progress their source  
>>> ranges.
>>
>> That's not a range operation.  Range operations are empty, popFront,  
>> front.  Anything built on top of ranges must use ONLY these three  
>> operations, otherwise you are talking about something else.
>
> I guess that's not the right terminology for what I'm trying to express.  
> I was thinking of "operations that act on ranges."

What I don't want is to accept ranges as streams.  For example, if we have  
an isInputStream trait, it should not accept ranges.  But you certainly  
can use existing phobos functions to shoehorn ranges into a stream-like  
API.

> Ultimately, we do need some type of a traditional stream interface; I  
> was just thinking about using ranges behind the scenes and using  
> existing pieces of the standard library for stream operations rather  
> than putting all of the operations into a unified data type. I'm not  
> sure if it could really be called an "ideal" design, but I do think that  
> it could provide a good minimalist solution with performance that would  
> be acceptable for at least many applications.

I hope my above comments have made clear that I am not against having  
ranges be forcibly changed into streams.  What I don't want is ranges  
implicitly treated as streams.  Certainly, we have a lot of existing  
range-processing code that could be leveraged.  But streams and ranges are  
different concepts, different APIs even.  Building bridges between the two  
should be possible, and ranges will make great interfaces to streams.

-Steve