isInputRange is false for non-copyable types

Mon Apr 16 09:01:34 UTC 2018

On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
> On 16/04/18 01:28, Jonathan M Davis wrote:
> > The fact that foreach copies the range that it's given makes using ref
> > with anything other than arrays a bit of an iffy proposition. It will
> > compile regardless of whether front returns by ref or not (which
> > arguably is a bug), but it will only affect the original elements if
> > copying the range doesn't copy the elements
>
> You keep referring to ranges that contain the data, but I am racking my
> mind and failing to find examples.

Pretty much any generative range does. e.g. iota contains data, and each
element is then calculated from the current value. The same goes for stuff
like random number generators. It's quite common for ranges to have popFront
calculate the next front and store it as a member rather than have front do
the calculation, since then multiple calls to front don't incur the cost of
whatever has to be done to get the next element, and they can just return
the same value each time. In fact, that's generally considered to be best
practice. It's true that most ranges do not contain every element directly
in a struct, but some do (e.g. std.range.only does, which is a way to
generate a range from a list of elements without allocating anything on the
heap), and many ranges at least store the value for front. So, I'd say that
a fairly large percentage of ranges store at least one value directly in the
range object.

> Plus, as I've said before, if ranges are expected to contain data,
> copying them seems like an incredibly bad idea. The whole idea behind
> casually copying the range about is that it's cheap to do.

Yes, it's generally assumed that it's cheap to copy a range, but it's also
generally assumed by D code in general that copying is cheap. Relatively few
functions in Phobos accept arguments by ref. Some do accept by auto ref -
either to forward the refness or to avoid a copy - but since passing by ref
doesn't work with rvalues, most code in Phobos doesn't do so unless it's
intended to mutate the argument and give the caller access to the result. I
would strongly suggest that anyone who is looking to put an object that is
expensive to copy in a range consider making it so that that object is on
the heap rather than directly in the range.

> > But as it stands, using ref with foreach in generic code is not going to
> > go well. If anything, it's actually a code smell.
>
> You are conflating ref with owning data, which I find unsubstantiated.

My point is that if you have

foreach(ref e; range)
{
}

then unless range is a dynamic array, the odds are very high that it's a
bug, because then you risk problems like

foreach(ref e; range)
{
    e = foo;
}

Outside of dynamic arrays, that assignment is usually pointless. ref implies
that it's going to actually affect the range that you passed to foreach, but
in most cases, it won't. Because the range is copied, in order for ref to
affect elements of the range that is passed to foreach, the range must be a
reference type or must act like a dynamic array in the sense that each copy
refers to exactly the same objects - which is true for some ranges (e.g.
those over a container) - but for many ranges, it isn't (e.g. generative
ranges aren't usually backed by anything). Also, the front of most ranges
does not return by ref, which would be required for the ref on foreach to
actually allow you to mutate the element. As such, I'd argue that the fact
that ref is allowed on foreach for all ranges is a bug. At minimum, it
should require that front return by ref, and even then, depending on the
range, it won't necessary work to assign to the element and have it affect
the range that was copied when it was passed to foreach. So, while it will
work in _some_ cases to pass a range to foreach and then use ref, in
general, I would consider it a code smell, because the odds are high that it
does not do what the programmer who wrote it assumed that it would do.

> > And if a templated function fails to compile if it's given a particular
> > argument, and that failure is not due to the template constraint, that
> > usually means that the template constraint is incomplete.
>
> I do not accept this argument outright. In fact, our experience at Weka
> has drove us to the exact opposite direction.
>
> We deliberately set up functions to include invalid cases and then
> either let them fail because of lacking constraints, or even set up a
> static assert to fail the function generation rather than fail matching.
>
> We found that the error messages produced by the alternative are simply
> too difficult to parse out and figure what went wrong.

For better or for worse, it's generally considered a bug in Phobos if it's
possible to write code that gets past the template constraint and then does
not compile. There are reasons why that can be a problematic approach, but
that is the approach that we've taken thus far. As such, if isInputRange
started allowing non-copyable elements, it would be expected that all
range-based functions in Phobos would then have to either require
copyability in their template constraint or actually work with non-copyable
types.

> > it would mean that _every_ range-based function then has to worry about
> > non-copyable elements
>
> No, just those in libraries. The same way they have to worry about big O
> complexity, type safety, @nogc and a bunch of other things.
>
> When you write a library, you have to think about your users. This
> doesn't change anything about that.

It changes it by adding yet another complication to worry about when
implementing ranges and range-based functions that no one implementing
range-based code currently has to worry about. And writing bug-free generic,
range-based code can already be surprisingly difficult without adding yet
another complication into the mix.

> > Not all algorithms can call front just once without
> > copying it
>
> Huh? Where did that come from?
>
> Phobos already assumes that calling front needs to be an O(1) operation.
> You can call front as many times as you like. Why would you assume
> otherwise?

Sure, you can call front as many times as you want. But in some cases, it's
more expensive to call front multiple times than it is to call it once and
store the result. Also, while the result of front may be equivalent every
time, it may not be exactly the same object every time. For instance,

auto r2 = range.map!(a => to!string())();

is going to to result in a range that allocates a string every time that
front is called. This sort of thing is not an uncommon thing to do with map,
and while many ranges are implemented to do most of their work in popFront
(thereby making calling front cheap), there's no requirement that they do
so. So, there are ranges that do their work in front rather than popFront,
and in the case of map, it actually has to be done in front, because map can
be random-access, and using popFront to calculate the value only works for
front, not arbitrary elements. In such cases, calling front multiple times
is definitely worse than just calling it once and storing the result. I
think that pretty much the only time that it's better to call front multiple
times rathre than calling it once and storing the result is when it returns
by ref, because if it doesn't return by ref, you're either going to get a
copy every time you call front, or the range is going to be creating a new
object every time you call front. In those cases, storing the result of
front and reusing it would be cheaper. And since most ranges do not return
by ref, most ranges are not avoiding copies or any expensive calculations by
calling front multiple times. If anything, they're incurring them.

Now, arguably, in the cases where a generic function needs to access front
multiple times, it would be best if it tested front for refness and then
used static ifs so that it stored the result of front when it didn't return
by ref and called it multiple times when it did, but that would get ugly
fast, and I'm not aware of any Phobos code that currently does that.

> > The result of all of this is that in generic code, you have no clue
> > whatsoever what the semantics are of copying a range around, because it
> > depends entirely on the copying semantics of whatever range that code is
> > currently being instantiated with. As such, generic code basically has
> > to
> > assume that once you copy a range, you can't mutate the original (since
> > it may or may not be affected by mutating the copy and may or may not
> > be cleanly affected by mutating the copy),
>
> Jonathan, I'm sorry, but what you are describing is called "a bug".
>
> Input ranges provide no guarantees about copying the range itself. If
> you do it and expect *anything* (which it seems you do), you have a bug.
> If you need to copy the range itself, you absolutely need to require a
> forward range.
>
> A forward range with ref front absolutely promises that both iterations
> reference the same elements.
>
> I find myself sincerely hoping that you are making these examples up in
> order to win this discussion turned into an argument for some reason,
> instead of the way phobos is actually written, because if you're right
> phobos is *broken*.
>
> The good news is that if Phobos is broken, this is a great way to flush
> out the places it's broken. Let's add UTs calling all Phobos functions
> accepting input ranges (as opposed to forward ranges) with non-copyable
> ranges (i.e. - the range itself is non-copyable) and fix the compilation
> errors. This shouldn't take more than a few days to do.

I'm not making anything up. Very little range-based code passes ranges by
reference. They're copied by value all the time, even if they're not forward
ranges. foreach itself copies a range before iterating over it. Yes, if you
want to iterate over two independent copies, you need a forward range, and
you need to call save, but ranges are copied constantly. It's just that it's
then a bug to do anything with the original after the copy has been made. As
it stands, a range that cannot be copied is going to work with almost
nothing in Phobos and would be borderline useless. And adding ref everywhere
to range-based functions to change that would destroy our ability to chain
function calls with ranges. Also, a large percentage of range-based
functions return a range that wraps another range, and that means copying
the range that's passed in. It simply wouldn't work with many functions to
accept a range by ref without then copying the range inside the function
anyway. The assumption that input ranges are copyable is built into almost
everything range-based in all of Phobos. The rare exceptions are functions
like std.conv.parse which are specifically intended to mutate the range
that's passed in rather than copying it.

> >> And allowing ranges with uncopyable members would turn this potential
> >> deadly run-time bugs into easy to find compile time errors. Sound like
> >> a
> >> great reason to allow it, to me.
> >
> > Ranges have to be copyable.
>
> And why would a range iterating over uncopyable objects not be copyable?
> How are the two related?

Because many ranges do stuff like store the value of front after calculating
it with popFront or store some set of values to calculate new values. No
such range would work with non-copyable element types even if isInputRange
allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to
work with non-copyable element types, but the fact that ranges must be
copyable means that any range that worked with non-copyable element types
could not be a struct that stored any of its elements as a direct member.

- Jonathan M Davis