isInputRange is false for non-copyable types

Sun Apr 15 22:28:22 UTC 2018

On Sunday, April 15, 2018 15:32:37 Shachar Shemesh via Digitalmars-d wrote:
> On 15/04/18 09:39, Jonathan M Davis wrote:
> > It's extremely common for range-based functions to copy front. Even
> > foreach does it. e.g.
>
> Not necessarily:
>
> foreach(ref e; [S(0), S(1), S(2)]){}
>
> While that would work with foreach, it will not work with anything that
> expects an inputRange (and since D defines a forward range as an
> extension, this is also not a forward range).

The fact that foreach copies the range that it's given makes using ref with
anything other than arrays a bit of an iffy proposition. It will compile
regardless of whether front returns by ref or not (which arguably is a bug),
but it will only affect the original elements if copying the range doesn't
copy the elements - which in the case of a basic input range rather than a
forward range, is generally true, otherwise, it would be a forward range.
But as it stands, using ref with foreach in generic code is not going to go
well. If anything, it's actually a code smell.

For it to work in general, we'd either have to have foreach stop copying
ranges (which would break a _lot_ of code, including code that just uses
dynamic arrays), or we'd have to have some way to mark or detect a range as
working correctly with ref when it's copied - which would mean a new range
trait specifically for that, be that one that tests for a feature (such as
hasLength) or one that tests for a whole new range type. And I suspect that
we would have to use an enum or somesuch to tag the range as having the
right semantics, since I don't think that that can be determined just by
looking at the types of its member variables.

> > If non-copyable types were allowed, then no function which accepted a
> > range would be allowed to copy front without first checking that front
> > was copyable (something that most code simply isn't going to do)
>
> No, that's not correct.
>
> If non-copyable types were allowed, then functions that accept a range
> would fail to compile on non-copyable types if they do a copy.
>
> But the flip side is that if non-copyable would be allowed, a lot of
> functions that current copy would not. There are far too many
> unnecessary copies in the code that serve no purpose, but they are
> invisible, so they are not fixed.

And if a templated function fails to compile if it's given a particular
argument, and that failure is not due to the template constraint, that
usually means that the template constraint is incomplete. And that means
that any and all templated functions which copy front would then have to
test whether front was copyable. Yes, in some cases, that would mean that a
range-based function would be rewritten to not copy front, but either way,
it would mean that _every_ range-based function then has to worry about
non-copyable elements - either by testing for them in its template
constraint or by being written specifically in a way that works with them
and then adding that particular use case to their unit tests. So, allowing
non-copyable types complicates range-based code across the board.

And the reality of the matter is that in many cases, copying front is
actually more efficient. Not all algorithms can call front just once without
copying it, and many ranges (e.g. map) calculate front on each call, meaning
that repeated calls to front can be more expensive than just calling it once
and copying the result. Dealing with that in the most efficient way possible
would probably mean having generic code test whether front returns by ref so
that it can copy it if it doesn't and just keep referring to front directly
if it does, but that would tend to make for some really messy code in any
generic function where front needs to be accessed multiple times. As it
stands, because of functions like map, it's almost certainly better for any
function which accesses front multiple times to copy it.

> > Remember also that a large percentage of ranges that don't wrap dynamic
> > arrays put their elements on the stack, which means that whenever the
> > range is copied, the elements are copied.
>
> If I am guaranteed to be able to copy an input range, what's the
> difference between it and a forward range?
>
> Strike that, I have a better one: If I am allowed to copy an input
> range, what does the "save" function in forward range even mean?

You can copy input ranges as much as you like. You just don't get a
guaranteed independent copy that can be iterated separately. The classic
example would be that a struct copies its members, whereas a class is just a
reference. So with structs, you generally (but not always) have a value
type, whereas with classes, you have a reference type, and copies of each
act completely differently. One is independent, whereas the other is the
same as the original aside from assignment. To make matters worse, a range
could be a pseudo-reference type in that copying it doesn't provide a
completely independent copy, and it doesn't act like a reference type
either.

Dynamic arrays are one example of that. In their case, the result is a range
that's independent for iteration purposes but where mutating any elements
affects the original. Similarly, any time that a range is defined with
multiple members and they're not all value types, you get a pseudo-reference
type. At that point, how mutating the copy affects the original depends on
what each of the members does. It could act like a dynamic array, or you
could end up in a situation where mutating the copy only partially affects
the state of the iteration in the original, in which case, trying to use the
original after the copy has been made would be buggy regardless of whether
the code is generic or not. e.g. there's a real risk of a situation like
that when you do something like have a range wrapping a socket - especially
if it does something like put the bytes it grabs in a buffer in the range
when it reads them. If the state of the iteration is shared between value
types and reference types, you can get all kinds of weird behaviors if you
try to use the original after mutating the copy.

The result of all of this is that in generic code, you have no clue
whatsoever what the semantics are of copying a range around, because it
depends entirely on the copying semantics of whatever range that code is
currently being instantiated with. As such, generic code basically has to
assume that once you copy a range, you can't mutate the original (since it
may or may not be affected by mutating the copy and may or may not be
cleanly affected by mutating the copy), and as such, if you want an
independent copy to iterate over, you need a specific way to get one other
than just copying. That's what save does.

save was introduced to deal with ranges that were classes (though since it
generally requires allocating a new object with every call to save, it's not
necessrily a great idea to implement such a range). However, the issues
related to copying ranges in general weren't as clear at that point in time.
It was just clear that if you wanted to be able to generically get an
independent copy, a solution was needed.

But of course, since _most_ ranges act just like save when they're copied,
save gets frequently forgotten, making it a bit of a thorn in our side. And
the fact that generic code can't ever use a range after copying it (even
after passing it to foreach, since that does a copy) is something that's
frequently missed, causing subtle bugs. So, the fact that ranges weren't
better thought through in the beginning with regards to copying means that
we have something that works quite well in typical use cases but gets fairly
thorny when you're writing generic code (especially if you're then
distributing it to other folks to use with ranges that you've never seen or
heard of).

A lot of the problems that we have with ranges stem from the fact that
they're basically designed around being an abstraction of dynamic arrays,
and any time that a range doesn't act like a dynamic array does, we start
ending up with cracks in how things work. In general, we've gotten it to
work fairly well with just front, popFront, and empty if you can assume that
ranges copy like dynamic arrays do, but of course, many don't, and there,
things start to fall apart. Adding save helped, but it didn't really fix the
problem, since it can so often be ignored and still have code work, and it
leaves the problem of basic input ranges, which by definition, can't be
copied like dynamic arrays are. If a range can't have save, then it's
basically because there's something about its iteration state that either
can't be saved or can't be saved efficiently.

Ideally, we'd be able to do something like require that basic input ranges
be reference types (since they obviously can't be value types, or they'd be
forward ranges) and require that forward ranges act like value types (at
least insofar as copying them copies the iteration state and thus acts like
save). And in that case, if we could have a clean way to differentiate
between input ranges and forward ranges, we could drop the whole save thing
and even rely on how ranges behave when they're copied, but it would make
certain types of ranges that work right now not work (e.g. classes couldn't
forward be ranges without being wrapped in a struct with a postblit
constructor, and basic input ranges would either have to be on the heap or
involve a pointer to the stack). It would probably also require that we
treat input ranges and forward ranges as incompatible, which would be great
in the case where we want to deal with a forward range but not so great in
the case where we want to be able to just iterate without ever having to
look at any part of the range multiple times, since then we'd basically have
to have separate code for input ranges and forward ranges.

So, some redesign of ranges would be great. However, they're highly
integrated into Phobos and heavily used in existing code, making it very
difficult to have any kind of clean deprecation path so that we could
transition from the current state of things to wherever we want to be. We
could do it if we were willing to pretty much outright fork Phobos, but I'm
not sure that it can be done otherwise. And there's still the issue of
foreach in that if we want to change its semantics, that would be a breaking
change in the language itself, meaning that forking the standard library
wouldn't fix that one. If we want to change foreach at all, we'd have to
figure out how to do so in place with a clean deprecation path.

> > foreach(e; range)
> > {
> > }
> >
> > is lowered to something like
> >
> > for(auto __range = range; !__range.empty; __range.popFront())
> > {
> >
> >      auto e = __range.front;
> >
> > }
>
> And that indeed generates a lot of confusion regarding what running two
> foreaches in a row does. This is a bug that already affects more than
> just uncopyable objects.

It was the behavior that that was required in order to have ranges behave
the same as dynamic arrays with foreach without changing how dynamic arrays
work with foreach. Yes, that becomes a problem once you start having ranges
that are reference types or have any range type where copying it is not
equivalent to calling save, but given that ranges were added on top of the
existing behavior, nothing else would have made sense.

Maybe foreach should have been changed for dynamic arrays, but I don't think
that the implications of copying ranges in generic code had been
particularly thought through at the point in time that ranges were added to
foreach, and I don't know how easy it would have been to talk Walter into
changing the behavior of foreach for dynamic arrays at the time. At this
point in time, it would require a way to cleanly transition from the old
behavior to the new without immediately breaking existing code. If someone
can come up with that, and the arguments for changing foreach are compelling
enough, then maybe it can be changed, but as it stands, it's just one of
those annoyances that stems from the fact that ranges tried to emulate
dynamic arrays without requiring that they act like dynamic arrays.

> > Now, generic range-based code really shouldn't ever use a range after
> > it's been copied, since whether mutating the copy affects the original
> > depends on the implementation of the range (and thus generic code
> > should assume that foreach may have consumed the range and call save if
> > it doesn't want that to happen), but lots of code does it anyway,
>
> And allowing ranges with uncopyable members would turn this potential
> deadly run-time bugs into easy to find compile time errors. Sound like a
> great reason to allow it, to me.

Ranges have to be copyable. If they're not, you have to use ref all over the
place, and it becomes impossible to create a function that returns a new
range without putting it on the heap, completely destroying function
chaining unless we want to start allocating a whole bunch of ranges on the
heap, which would destroy performance. It would be great if we could clean
up the semantics of copying ranges, but making it so that you can't rely on
ranges being copied would destroy our ability to use ranges as we do now.

> I'm going to stop responding now, because, frankly, I think you and I
> are having very different discussions.
>
> You seem to be advocating against making *all* ranges support
> non-copyable types, while I'm merely asking that *any* range be allowed
> to support such types.
>
> Current situation is that a range with uncopyable types is not
> considered an input range, and cannot use any phobos range functions,
> including some that it should be able to use without a problem.
>
> There is no reason to block such ranges from using "map" or "any",
> except the fact they require that the type answer "isInputRange" with
> true.

If isInputRange accepted non-copyable element types, then all range-based
code would have to deal with non-copyable element types. Yes, in many cases,
that would mean adding a check to their template constraints which made them
require that the element type be copyable rather than refactoring them to
actually work with non-copyable element types, but all well-written,
range-based code would then have to take non-copyable types into
consideration. And any range-based code that was going to have to work with
non-copyable types would then have to be very particular about what it did.
We'd be complicating all range-based code to handle yet another corner case.
We have too many problems already because of ranges trying to be too many
things at once and not locking down their semantics in ways that either
require things that are not currently required or disallow things that are
currently allowed.

- Jonathan M Davis