let's talk about output ranges

Thu Feb 6 10:48:22 PST 2014

On Thursday, 6 February 2014 at 18:10:53 UTC, H. S. Teoh wrote:
> Would it make sense to have a .full method/property analogous 
> to input ranges' .empty? Perhaps something like:

That could be ok but I agree that having to check is a pain. 
Besides, what are you going to do if it is full? Suppose I call

char[1] b;
toUpper("lol", staticSink(b[]));

The best toUpper could do is truncate or throw.... and I think 
that would be better encoded in the range itself so the caller 
can decide.

auto got = toUpper("lol", staticSinkTruncating(b[]));
if(got.length < "lol".length) {
    // we could perhaps call it again to process the rest
}

(put in the truncating one would just check its own length and 
noop if it is full)

Otherwise, throwing range violations/out of memory exceptions are 
what would most likely happen anyway.

> One thing that
> the current output range API doesn't do very well is chaining.

Indeed. In my other post, I just wrote about finish. Finish 
serves to flush the buffer (digests or compression algorithms for 
example might need to be padded to block size), could finalize 
things (suppose an appender which just puts the pieces into a 
static array, then calls join all at once at the end), and could 
also just generally return the result.

There are some cases where returning from finish doesn't make 
sense, such as if you sunk to a file, you wouldn't keep an array 
of the contents around... but finish is still potentially useful 
in that it could close the file or release a lock. (Of course, 
dtors could do that too. But destructors can never return data to 
the user - that's where finish is special.)

Anyway, not all output ranges would offer finish and not all 
would return T[]. But not all input ranges offer opSlice either 
so we're still in analogous territory.

> This is a big usability hindrance. Ideally we'd want to write 
> something
> like:
>
> 	auto result = "mystring".toUpper(ArcOutputRange!string())
> 				.translate("abc", "def");
>
> But I'm not sure how this can be made to work.

hmmm.... finish doesn't account for all that.... well, I guess it 
could by returning a range.

tbh toUpper might be better as a higher-order input range. Like 
alias toUpper = map!charToUpper(...). Those chain, they don't 
allocate, and they are well-defined right now.

Then at the end we build the result lazily and just put it all at 
once into the output range.

"mystring".toUpper.translate("abc","def").array(ArcOutputRange!string());

Yeah, I actually think that's the way to go. And calling .array 
at the end is nothing new to Phobos anyway. I'd be a bit weird 
doing it with toUpper but I think it really is the best fit.

(BTW I would be PISSED of toUpper actually changed like this. 
It'd break a bunch of code and I don't really care that toUpper 
allocates. I want it to just work. But we could offer equivalent 
functionality via per-character functions and map so we don't 
have to break code to offer the new options.)

> So we should extend put() to take an index, then?

that would work.

> An allocator is definitely not an output range!

yup, and I don't think a static array is either. A static array 
is neither an input range, since you can't do a = a[1..$]. But 
offering easy getters for such is easy and it rox.

> into a data sink should not care what an allocator is; they 
> should take an output range.

Actually, I think they should generate lazy input ranges whenever 
possible. Then only at the end do we send it to the output range. 
It's just input ranges aren't allowed to allocate, that would 
kill their complexity guarantee, so we need an example of a 
function which *must* allocate up front.

They want the random access output range. Otherwise we can just 
put at the end.

> Let
> stdout do the buffering, and let toLower send the data to stdout
> directly. Calling an allocator from toLower essentially amounts 
> to buffering the data twice.

yes

> They should probably be *always* passed by ref, otherwise you 
> could end up with some pathological behaviour of data from 
> multiple sources overwriting each other because they were 
> operating on copies of output ranges instead of references to a 
> single one.

That won't necessarily work though, you can't have a ref default 
parameter. But we can use pimpl or something to force a regular 
struct to be a ref item. Lazy initialization can be surprising, 
but we deal with that already with array slices so I think it is 
ok.

> Also, delegates and function pointers should be treated as 
> output ranges as well (Phobos should define .put and whatever
> other needed methods for them via UFCS).

Yes, indeed.

> Doesn't solve the case where you call some library function 
> that throws, though. :-(

at least there's nothrow if it is really that important to us.