Transience of .front in input vs. forward ranges

Sun Nov 4 20:20:58 PST 2012

On Sun, Nov 04, 2012 at 07:21:13PM -0800, Jonathan M Davis wrote:
[...]
> Now, if we say that we need transient fronts for some input ranges but
> don't want to add an isTransient trait or something similar, and so
> we're going to say that transient fronts are only acceptable for input
> ranges and that algorithms have to assume that front can be transient
> for input ranges, then fine. I don't really like it, but I'm
> definitely in favor of simplifying things here.

I still think deadalnix's proposal is the simplest and most elegant so
far. To summarize:

- We define a .transient property for ranges that default (via UFCS) to
  simply return the range:

	@property R transient(R)(R range) { return range; }

- We modify File.byLine() to be non-transient, and provide the current
  transient version as a .transient property:

	struct ByLine(C) {
		...
		C[] front() { return buffer.dup; }
		auto transient() {
			struct TransientByLine(C) {
				// current implementation of ByLine
			}
			return TransientByLine!C();
		}
	}

- Range transformations that cannot handle transient ranges do not need
  to be changed. UFCS will cause .transient on the result ranges to just
  alias the (non-transient) range itself. This is the default situation.

- Range transformations that *can* handle transient ranges will return
  a range that has a .transient property. For example, if we make joiner
  transient-capable:

	auto joiner(R)(R ror) {
		struct Joiner {
			...
			auto front() {
				// non-transient implementation
			}

			auto transient() {
				struct TransientJoiner {
					auto front() {
						// transient implementation
					}
				}
			}
		}
		return Joiner(ror);
	}

  The important thing to note here, is that joiner returns a *non*
  transient range by default. You do NOT get a transient range out of it
  unless you specifically ask for .transient on the resulting range.

- Range consumers decide whether or not they can handle transience. If
  they can't, nothing needs to be done: the current implementation
  already works. If they can, then they simply ask for .transient from
  the range passed in. For example, canFind() is such a candidate:

	bool canFind(R)(R r1) {
		// This line asks for a transient range. Remember, if r1
		// isn't transient, this automatically just becomes r1
		// itself.
		auto r = range.transient;

		while (!r.empty) {
			... // implementation here
		}
		return result;
	}

  The important thing to note here is that if canFind gets a
  non-transient range, then nothing different happens, it just runs as
  usual. It only gets a transient range if r1 is capable of transience.

Why do I think this is simpler than the other proposals (including my
own)? Because:

(1) We don't even have to change any existing code for everything to
work correctly (besides making byLine non-transient and implementing
the default .transient via UFCS). Everything by default is
non-transient, and all existing code continues to work. They may run a
bit slower due to byLine .dup'ing the internal buffer, but they will run
*correctly*.

(2) Existing algorithms can be updated one-by-one to take advantage of
transience. If a transient-capable algorithm isn't updated, then it just
defaults to using non-transient ranges: it will not run faster, but it
will run *correctly*.

Once an algorithm is updated, then transient ranges can be used where
supported, so you get speedups where the algorithm has specifically been
designed to deal with transient ranges.

In short, nothing needs to be done immediately. We can take our time to
update existing algorithms one by one. In the meantime, correctness and
safety is not compromised. Just a little speed sacrifice is all.

(3) New algorithms that are written without transience in mind will
automatically work correctly -- because all ranges are non-transient by
default. So new code doesn't even need to worry about transience. They
can be updated later to take advantage of transience, if we decide to do
so.

(4) The *user* doesn't have to care about transient ranges. It is the
range consumers that decide whether or not they can consume a transient
range, and they are the ones that decide whether a transient range is
actually used. User-defined ranges will be non-transient by default; if
they know what they're doing, then they can define .transient to allow
reusing buffers. But if not, everything falls back to a safe (if a tad
slower) default.

IOW, this change is non-intrusive, doesn't demand massive code changes,
can be done incrementally, and gets rid of cognitive burden on user code
to handle transient ranges carefully. The user doesn't even need to know
there's such a thing as a transient range and everything will still work
correctly. If they use transient-aware Phobos range consumers, they get
a speed boost without even needing to know transient ranges were
involved.

What's not to like about this proposal?

> However, that means that we need to be very clear about the fact that
> input ranges can have transient fronts and that algorithms cannot
> assume that their fronts are not transient (something that only you
> appear to have thought was clear). We then either have to change a
> bunch of algorithms so that they require forward ranges, or we need to
> create a trait which can determine whether front is transient in it
> least some cases (it would still be claiming that front is transient
> in some cases where it isn't, because it can't know for sure, but we'd
> at least be able to figure it out in some cases), and make algorithms
> use that in their constraints when they deal with input ranges.
> std.array.array would be a prime case where an algorithm would have to
> be changed, because it can't function with a transient front, and it's
> a prime example of a function that you'd normally expect to work with
> an input range.  And this _will_ break code. It'll allow us to fix the
> ByLine problem though.
[...]

Using deadalnix's proposal, std.array.array would be a non-issue. Since
ranges will be non-transient by default, array() will Just Work with no
further effort.

No existing algorithms need to be updated. If they aren't written to use
.transient, then they will just get the default non-transient range, and
work correctly as-is.

No existing code will break. They may just run slightly slower due to
byLine() being non-transient now, but in no case will there be broken
code. We can just fix byLine() to be non-transient right now, without
doing anything more, and everything will Just Work.

We will just have to update the docs to state that, by default, ranges
must be non-transient.

Only when we want to optimize certain algorithms, will we even need to
worry about .transient. Unoptimized algorithms continue to work with no
change.

The .transient property doesn't even need to be propagated: if it is not
propagated, then everything just defaults to the non-transient version
of the range. The optimization only kicks in when we specifically
implement .transient for that range.

This is much less intrusive than any other alternative I've seen so far.

T

-- 
"I'm not childish; I'm just in touch with the child within!" - RL