Transience of .front in input vs. forward ranges

Tue Oct 30 15:29:17 PDT 2012

Now that Andrei is back (I think?), I want to bring up this discussion
again, because I think it's important.

Recently, in another thread, it was found that std.algorithm.joiner
doesn't work properly with input ranges whose .front value is
invalidated by popFront(). Andrei stated that for input ranges .front
should not be assumed to return a persistent value, whereas for forward
ranges, .front can be assumed to be persistent. However, Jonathan
believes that .front should never be transient.

Obviously, both cannot be the case simultaneously. So we need to decide
exactly what it should be, because the current situation is subtly
broken, and this subtle brokenness is pervasive. For example, I recently
rewrote joiner to eliminate the assumption that .front is persistent,
only to discover that in the unittests, I can't use array() or equal()
(or, for that matter, writefln()), because they apparently all make this
assumption at some point (I didn't bother to find out exactly where).

In other words, right now input ranges really only work with arrays and
array-like objects. Not the generic ranges that Andrei has in mind in
his article "On Iteration". Many input ranges will subtly break, the
prime whipping boy example being byLine (which I hate to bring up
because it does not represent the full scope of such transient ranges),
a range that returns in-place permutations of an array, or anything that
reuses a buffer, really.

This situation isn't as simple as input ranges being transient and
forward ranges not, though. I want to bring another example besides the
dead horse byLine() to the spotlight. Let's say you have a range R that
spans all permutations of some starting array A. For efficiency reasons,
we don't want to allocate a new array every time we return a
permutation; so we have an internal buffer in R that holds the current
permutation, which is what .front returns. Then popFront() simply
permutes this buffer in-place. Something like this:

	struct AllPermutations(T) {
		T[] front, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			front = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(front, first))
				done = true;
		}
		@property bool empty() {
			return !done;
		}
	}

This is an input range, according to Andrei's definition. The value of
.front is transient, since popFront() modifies it in-place. According to
Jonathan's definition, however, this isn't a valid range for that very
reason.

Now consider what happens if we add this member:

	auto save() {
		AllPermutations!T copy;
		copy.front = this.front;
		copy.first = this.first;
		copy.done = this.done;
		return copy;
	}

This returns a separate instance of the same range, starting with the
current permutation, and ending with the original permutation, as
before. I submit that this makes it a forward range. However, this fails
to be a forward range under Andrei's definition, because forward ranges
require .front to be persistent. So we'd have to modify the range to be
something like this:

	struct AllPermutations(T) {
		T[] current, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			current = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(current, first))
				done = true;
		}
		@property bool empty() {
			return !done;
		}
		@property T[] front() {
			return current.dup; // <--- note this line
		}
		auto save() {
			AllPermutations!T copy;
			copy.front = this.front;
			copy.first = this.first;
			copy.done = this.done;
			return copy;
		}

	}

Note that now we have to duplicate the output array every time .front is
accessed. So whatever gains we may have had by using nextPermutation to
modify the array in-place is lost, just so that we can conform to an
arbitrary standard of what a forward range is.

Under Jonathan's definition, we'd have to incur this cost regardless of
whether we had save() or not, since .front is *always* required to be
persistent.

But I propose that the correct solution is to recognize that whether or
not .front is transient is orthogonal to whether a range is an input
range or a forward range. Many algorithms actually don't care if .front
is persistent or not (at least in theory), including equal(), find(),
map(), count(), reduce(), joiner(). Maybe they *currently* assume that
.front is persistent, but they are easily implementable *without* this
assumption (I have an implementation of joiner that doesn't make this
assumption).

Not allowing forward ranges to have transient .front values, or not
allowing transient .front values at all, will introduce the arbitrary
need for lots of implicit copying, which makes D code inefficient. It
will also limit the applicability of std.algorithm and std.range (if
said the cost of said copying is unacceptable for a particular
application).

The problem, of course, is how to check of a range has transient .front
or not. I proposed adding a .isTransient member to the range, but this
depends on the range implementor to remember to mark the range with
.isTransient=true, and we know that coding by convention is not
scalable. So here's another idea:

	template isTransient(R) if (isInputRange!R)
	{
		static if (is(typeof(R.front) : immutable(typeof(R.front))))
			enum isTransient = false;
		else
			enum isTransient = true;
	}

The idea is that value types are implicitly convertible to immutable,
and value types are non-transient. Reference types, if they can be made
immutable, ensures that calling popFront() will never invalidate them
(by definition of immutable).

To use byLine as an example, if we want to make it non-transient, we
simply have .front return string instead of char[]. Then we can safely
use it with any of the existing algorithms that assume that .front won't
mutate under them when popFront() is called.

Algorithms that *don't* need .front to be persistent can just take
input/forward/whatever ranges as usual.

In any case, whether we decide to do this or not, we NEED to set down a
clear, unambiguous definition of exactly what an input range is, and
what a forward range is, and what kind of assumptions may be safely made
regarding the value of .front. The current ambiguous situation is a
hiding place for subtle bugs, and is unacceptable.

T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see
through walls. It was called the "window".