Transience of .front in input vs. forward ranges

Mon Nov 5 16:49:42 PST 2012

On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote:
[...]
> >With Andrei's proposal, all code that assumes transient .front with
> >input ranges are broken by definition.
> 
> I think this should be: all code that DOES NOT assume transient
> .front with input ranges is broken.

Then std.array.array is broken by definition, and cannot be implemented
with anything less than a forward range. This will very likely break a
lot of existing code.

> >I've looked over std.algorithm -- there are a *lot* of places with
> >this assumption. Instead of getting correct default behaviour, you
> >get a whole bunch of broken code with buggy behaviour, unless you
> >first rewrite a lot of code in std.algorithm (and probably std.range
> >as well).
> 
> Could you please put together a list of these algorithms so we have
> it? Thanks.

This is probably an incomplete list, but it's a start:

std.algorithm.reduce - when no seed is given.
std.algorithm.joiner - both variants (I have a fix for the second variant)
std.algorithm.group
std.algorithm.minCount
(std.algorithm.minPos - but takes forwardRange, so probably OK)
(std.algorithm.Levenshtein - takes forwardRange, probably OK)
(std.algorithm.makeIndex - takes forwardRange, probably OK)
std.algorithm.splitter - tries to take slices without checking if range is sliceable
std.algorithm.uniq
std.algorithm.topNCopy - copies .front of input range.
std.algorithm.NWayUnion - copies .front of (possibly input) outer range into a heap
(std.algorithm.largestPartialIntersection - uses NWayUnion)
std.array.array - tries to copy input range
std.array.insertInPlace - tries to copy input range
std.array.join - tries to copy input range
std.stdio.writeln - there's too much code here so I didn't narrow it
	down, but testing with a transient range shows that it doesn't
	work correctly. This probably implicates std.format somewhere.

I didn't check the other Phobos modules, but basically any code that
currently takes an input range needs to be audited for this issue.

[...]
> >How is this any better than deadalnix's solution?  From what I can tell,
> >it's a lot worse, for all of the above reasons.
> 
> I think we should start from here: the .transient proposal will not
> be accepted because it is too complex. Consider it a baseline that
> other proposals would be evaluated against. Then let's see how to
> devise a robust, simple solution.
[...]

I still think forcing input ranges to be transient is oversimplifying
the issue. Whether a range is transient is orthogonal to whether you can
.save the current position or whether you can traverse it in two
directions. Trying to conflate the two only leads to leaky abstractions.

The only real solution IMO is to address the issue head-on: either
recognize transience as an inherent property of ranges, or (re)define
ranges to exclude all transience.

If we recognize transience as an inherent property of ranges, then no
matter what we do, there's going to be complications, because you'll
always have to deal with two cases. Either an algorithm can handle
transience, or it can't. If it can't, there needs to be a way to make it
so that it will reject transient ranges somehow. With deadalnix's
proposal, the overhead of doing this is more or less minimized, but it
is still there.

If we exclude all transience, then there is no additional complication.
We just have to fix the current code and make it crystal clear in the
documentation that .front must always be persistent no matter what. The
disadvantage here is that some algorithms, like find(), don't *need* the
persistence of .front, so you lose generality. Any code that works with
transient ranges will have to stick with foreach, or reinvent
std.algorithm. With .transient proposal, this is the default behaviour,
except that we have the *option* of using transient ranges if both the
range type and the algorithm can handle it.

So the .transient proposal is pretty close to a comfortable balance
between the two extremes. If it is still considered too complicated,
then I'd like to hear what other proposals will address all the
underlying issues in just as nice a way (or an even nicer way), without
compromising this balance between the two extremes.

T

-- 
In order to understand recursion you must first understand recursion.