std.stream replacement
Steven Schveighoffer
schveiguy at yahoo.com
Thu Mar 7 04:42:24 PST 2013
On Wed, 06 Mar 2013 20:15:31 -0500, BLM768 <blm768 at gmail.com> wrote:
> On Wednesday, 6 March 2013 at 16:36:38 UTC, Steven Schveighoffer wrote:
>> On Tue, 05 Mar 2013 18:24:22 -0500, BLM768 <blm768 at gmail.com> wrote:
>>>
>>> Ranges aren't necessarily higher- or lower-level than streams; they're
>>> completely orthogonal ways of looking at a data source. It's
>>> completely possible to build a stream interface on top of a range of
>>> characters, which is what I was suggesting. In that situation, the
>>> range is at a lower level of abstraction than the stream is.
>>
>> I think you misunderstand. Ranges absolutely can be a source for
>> streams, especially if they are arrays. The point is that the range
>> *interface* doesn't make a good stream interface. So we need to invent
>> new methods to access streams.
>
> Although I probably didn't communicate it very well, my idea was that
> since we already have functions like std.conv.parse that essentially
> provide parts of a stream interface on top of ranges, the most
> convenient way to implement a stream might be to build it on top of a
> range interface so no code duplication is needed.
My point is, we should not build streams from ranges. We have to
establish terminology here. A range is an API which provides a way to
iterate over each element in a source using the methods front, popFront,
and empty.
A basic stream provides a single function: read. This function reads N
bytes into an array, and advances the stream position. Not a range, an
array. That is the basic building block that the OS gives us. You can
make read out of front, popFront, and empty, but it's going to be horribly
low-performing, and I see no benefit to have read sit alongside the range
primitives.
On top of that, we provide a buffered stream which manages the array the
lower-level stream outputs, and allows access to data a chunk at a time.
What defines that chunk is application-specific.
At a higher level is where ranges and streams meet. front can provide
access to a chunk, popFront can move on to the next chunk, and empty maps
to EOF (last read returned 0 bytes). That is a great mapping, and I
expect it will be the preferred interface. What I want to provide with
std.io is an easy way to build ranges on top of streams by defining a
mechanism to build the chunk.
But to say that streams are ranges at heart is incorrect. Streams need
the read feature, they don't need range features.
Now, if you want to shoehorn a range into a stream, I certainly can see
how it will be possible. Extremely slow, but possible. That should be
the last resort. It shouldn't be the foundation.
There is the temptation to say "hey, arrays are ranges, and arrays make
good stream sources! Why can't all ranges make good stream sources?" But
arrays are good stream sources NOT because they are ranges, but because
they are arrays. Reading an array into an array is a noop.
>
>>> Create a range operation like "r.takeArray(n)". You can optimize it to
>>> take a slice of the buffer when possible.
>>
>> This is not a good idea. We want streams to be high performance.
>> Accepting any range, such as a dchar range that outputs one dchar at a
>> time, is not going to be high performance.
>
> If the function is optimized, it can essentially bypass the range layer
> and operate directly on the buffer while using the same interface it
> would use if it were operating on the range. As I understand it, some of
> the operations in Phobos do that as well when given arrays.
This is the wrong track to take.
There have been quite a few people in the D community that have advocated
for the syntax:
int[] arr;
auto p = 5 in arr;
Just like AAs. It looks great! Why shouldn't we have a way to search for
data with such a concise interface? The problem is then that diminishes
the value of 'in'. For AAs, this lookup is O(1) amortized, For an array,
it's O(n). This means any time a coder sees x in y, he has to consider
whether that is a "slow lookup" or a "quick lookup". Not only that, but
generic code that uses the in operation has to insert caveats "this
function is O(n) if T is an array, otherwise it's O(1)". The situation is
not something we want.
But if you still want to find 5 in arr, there is the not-as-nice, but
certainly reasonable looking:
auto p = arr.find(5).ptr;
My point is, we don't want any range to substitute for a stream. I think
it might be worth considering accepting random-access ranges, or
slice-assignable ranges to be stream sources, but not just any range. We
could provide a "RangeStream" type which shoehorns any range into a
stream, but I'd want it tucked in some shadowy corner of Phobos, not to be
used except in emergencies when nothing else will do. It should be
discouraged.
>>> Range operations like std.conv.parse implicitly progress their source
>>> ranges.
>>
>> That's not a range operation. Range operations are empty, popFront,
>> front. Anything built on top of ranges must use ONLY these three
>> operations, otherwise you are talking about something else.
>
> I guess that's not the right terminology for what I'm trying to express.
> I was thinking of "operations that act on ranges."
What I don't want is to accept ranges as streams. For example, if we have
an isInputStream trait, it should not accept ranges. But you certainly
can use existing phobos functions to shoehorn ranges into a stream-like
API.
> Ultimately, we do need some type of a traditional stream interface; I
> was just thinking about using ranges behind the scenes and using
> existing pieces of the standard library for stream operations rather
> than putting all of the operations into a unified data type. I'm not
> sure if it could really be called an "ideal" design, but I do think that
> it could provide a good minimalist solution with performance that would
> be acceptable for at least many applications.
I hope my above comments have made clear that I am not against having
ranges be forcibly changed into streams. What I don't want is ranges
implicitly treated as streams. Certainly, we have a lot of existing
range-processing code that could be leveraged. But streams and ranges are
different concepts, different APIs even. Building bridges between the two
should be possible, and ranges will make great interfaces to streams.
-Steve
More information about the Digitalmars-d
mailing list