let's talk about output ranges

Thu Feb 6 10:09:23 PST 2014

On Thu, Feb 06, 2014 at 04:21:51PM +0000, Adam D. Ruppe wrote:
> splitting from the ARC thread
> 
> On Thursday, 6 February 2014 at 15:47:48 UTC, Johannes Pfau wrote:
> >As they do not keep references there's no need for ARC or GC,
> >we just need a way to tell every function how it should allocate.
> 
> yea. I think it is time we hashed out output ranges the same way
> we've done with input ranges. Right now, output ranges just have
> put(T). I think it would be useful to add extensions similarly to
> random access ranges, infinite ranges, etc..

+1. I think output ranges (as a concept, not necessarily what's
currently defined in Phobos) are far underused, and have a lot of
untapped potential.

> What I want to see work is:
> char[3] buffer;
> char[] got = toLowerWithRange(buffer, "FOO"); // if we allow it w/o
> slicing
> assert(got.ptr is buffer.ptr);
> assert(got == "foo");
> 
> char[] got = toLowerWithRange(buffer, "FOOL"); // throws, out no
> more space
> 
> char[] got = toLowerWithRange(buffer[], "FOO"); // same result as
> above
> char[] got = toLowerWithRange(buffer[], "FOOL"); // MAyBE
> reallocates
> 
> (I call it toLowerWithRange just to avoid any overloading
> ambiguities when reusing the toLower name. Names aren't important to
> me though.)
> 
> What do we have here? When passed a static array, it must not grow
> beyond its length - the only allocation allowed is the exception it
> throws.

Would it make sense to have a .full method/property analogous to input
ranges' .empty? Perhaps something like:

	struct MyOutputRange {
		void put(...);
		@property bool full() { ... }
	}

Though it may not be a good idea for the caller to constantly have to
check if something is full before writing to it. Maybe having put()
throw is the better way (then output ranges that allocate will just
allocate when full without the caller needing to worry about it).

> When passed a slice though, things are a bit more interesting. I
> think then it should try to put it directly in the slice, but if
> that isn't possible, go ahead and reallocate.
> 
> Though that's not necessarily sane either. Maybe slice should also
> throw if there isn't enough room.

I think we're jumping the gun to implementation details here. What about
other logical concepts that output ranges should support? One thing that
the current output range API doesn't do very well is chaining. For
example, if you don't care about allocation strategy and just use the
GC, you can write:

	auto result = "mystring".toUpper.translate("abc", "def");

But once you have to pass an explicit output range to it, you can't
chain things anymore:

	auto result1 = ArcOutputRange!string();
	"mystring".toUpper(result1);	// need to pass in result explicitly

	auto result2 = ArcOutputRange!string();
	result.data.translate("abc", "def", result2);

This is a big usability hindrance. Ideally we'd want to write something
like:

	auto result = "mystring".toUpper(ArcOutputRange!string())
				.translate("abc", "def");

But I'm not sure how this can be made to work.

> toLower would prolly just call put(), with std.array or std.range
> defining it for static vs dynamic arrays.

Yes.

> With user defined types, this behavior would be tweakable, similarly
> to how input ranges might hasLength!T. Other similarities:
> 
> * A slice should also be a random access output range. Not all
> values are generated in order, one char at a time.

So we should extend put() to take an index, then?

	struct MyOutputRange {
		void put(T item);	// put item at current index
		void put(T item, size_t idx); // put item at specified index
	}

> * Allocators should *offer* output ranges, but not necessarily *be*
> output ranges, similarly to the relationship between containers and
> input ranges (a view into the container). This could argue that
> passing a static buffer without slicing it is wrong

An allocator is definitely not an output range! You don't put data into
an allocator, you get a memory buffer *from* an allocator that you can
put data into. Big difference. We shouldn't conflate the two.

That's why I said in another post, that algorithms that produce data
into a data sink should not care what an allocator is; they should take
an output range. Logically, it makes more sense: what you want is
something to put your result data into, you don't care how it's
allocated, or even whether it's a chunk of memory (it may be a socket or
pipe or whatever).  Doing it this way also eliminates some needless
allocations -- why do you need to allocate a string if all you want is
to take input data, transform it to lowercase, and write to stdout? Let
stdout do the buffering, and let toLower send the data to stdout
directly. Calling an allocator from toLower essentially amounts to
buffering the data twice.

> * We want this stuff to just work without a bajillion explicit
> wrappers, again, like input ranges.

Yes, which is why I brought up the issue of how to chain function calls
that takes output ranges. That's a big usability issue that we need to
address if output ranges are going to be as cool as input ranges were.

> * Output ranges would probably make the most sense to pass places by
> ref (as is the case with them most the time right now).

They should probably be *always* passed by ref, otherwise you could end
up with some pathological behaviour of data from multiple sources
overwriting each other because they were operating on copies of output
ranges instead of references to a single one.

Also, delegates and function pointers should be treated as output ranges
as well (Phobos should define .put and whatever other needed methods for
them via UFCS).

[...]
> >Exceptions currently can't work on a system without GC cause we
> >always use 'throw new' and nobody ever explicitly frees Exceptions.
> 
> Exceptions only come in exceptional circumstances anyway, so I think
> they are ok to be an... exception. LOL
> 
> but seriously, the exception path is rarely fully optimized in the
> first place. If you want to avoid them, use nothrow or preallocate
> them or something.

Yeah, you could do something like this:

	void myFailureProneAlgorithm(T...)(T args) {
		auto prealloc_exception = ARCAllocator!Exception();
		auto stacktraceBuffer = /* preallocate stacktrace here */;

		auto result = doBigComplicatedCalculation();
		if (result.hasError) {
			// Fill in exception info without allocation
			prealloc_exception.msg = result.err_msg;
			createStackTrace(stacktraceBuffer);
			prealloc_exception.stacktrace = stacktraceBuffer;

			// N.B.: no 'new'.
			throw prealloc_exception;
		}
		return result;
	}

	void main() {
		try {
			auto input = ...;
			auto output = myFailureProneAlgorithm(input);
		} catch(Exception e) {
			... // you get a statically allocated exception here
		}
	}

Doesn't solve the case where you call some library function that throws,
though. :-(

T

-- 
One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot