Range returning an array

Tue Apr 9 13:37:19 PDT 2013

On Tue, Apr 09, 2013 at 10:03:01PM +0200, Joseph Rushton Wakeling wrote:
> Hello all,
> 
> I have a situation in a program I'm writing where it's convenient to
> define a range whose return value is an array.  Something like:

Congratulations! You've created your first transient range. :)

[...]
> Now, the problem with a design like this is that it's unsafe, in the
> sense that, let's say I do something like,
> 
> 	T[] output;
> 	foreach(opvar; mySim)
> 		output = opvar;
> 
> ... then, at the end of this loop, output will not be the same as it
> was set to with the last assignment, because popFront() will have been
> called one last time prior to the "empty" condition being set to true.

Yes, this is the typical behaviour of transient ranges.

> Now, there are a number of ways I can think of to avoid this.  One is
> to return var.dup rather than var from front(), but this is
> undesirable because it will hit performance in a big way.

Right, this is to make it a non-transient range, which has the
associated performance hit.

> Another might be to calculate the updated value of var in a temporary
> array, and update var itself only if the diff is greater than the
> convergence criterion.

Personally, my preference is to let the user code call .dup to copy the
value instead of aliasing it. The crux of the issue is that returning an
array is by reference, so it's equivalent to returning a reference.  If
you were calling a ref function, you'd be careful about assuming whether
its value would change afterwards. The problem is that the range API
conflates by-ref and by-value .front's, and there's currently no way to
tell them apart aside from knowing the specifics of the range you're
dealing with.

> However, I'm curious enough about this problem that I thought I'd
> throw it open to everyone else to see if anyone has a better idea than
> one of these.  The challenge here is to ensure really good performance
> while not risking bad or incorrect results leaking outside of the
> range.
[...]

Some time ago on the main D mailing list, somebody (deadalnix?)
suggested extending the range interface to have a .transient function
that returns a transient version of the range. The idea goes something
like this:

	struct MyRange {
		auto front() {
			// Always .dup by default so there are no
			// surprises
			return frontImpl.dup;
		}
		auto transient() {
			struct TransientWrapper {
				MyRange r;

				// User specifically asks for high
				// performance transient range, so no
				// .dup here, they are responsible for
				// dealing with the aliasing effect
				// themselves
				auto front() {
					return r.frontImpl;
				}
			}
			return TransientWrapper(this);
		}
		... // other range functions
	}

Here's how you use it:

	void slowButSafeCode(MyRange r) {
		T[] x;

		// Use the range directly: everything is .dup'd by
		// default, it's slow but always safe.
		foreach (e; r) {
			x = e;	// this will be correct by default
		}
	}

	void fastTransientCode1(MyRange r) {
		// Specifically ask for transient range: we take
		// responsibility to call .dup if necessary
		auto tr = r.transient;

		foreach (e; tr) {
			// We're not storing the transient data, so no .dup
			doSomething(e);

			// Here we selectively .dup the elements we need
			// to cache.
			if (complexCondition(e))
				cache(e.dup);
		}
	}

The idea is that the range should be non-transient by default (so that
user code ignorant of transience will always be correct by default), but
if the user wants the speed, they can explicitly ask for a transient
range and take responsibility for using it correctly.

T

-- 
Fact is stranger than fiction.