isInputRange is false for non-copyable types

Sun Apr 15 06:39:43 UTC 2018

On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
> import std.range;
> import std.stdio;
>
> struct S {
>      int a;
>
>      @disable this(this);
> }
>
> void main() {
>      writeln(isInputRange!(S[])); // Prints false
> }
>
> The reason, as far as I can tell, is that an input range requires that
> we can do:
>
> auto a = ir.front; // Assuming ir isn't empty
>
> That doesn't work for non-copyable types, for obvious reasons.
>
> With that said, that seems more like a deficiency in the input range
> definition than anything. There is no reason we shouldn't be able to
> iterator type operations over non-copyable types.

It's extremely common for range-based functions to copy front. Even foreach
does it. e.g.

import std.stdio;

struct S
{
    this(int i)
    {
        _i = i;
    }

    this(this)
    {
        writefln("copied: %s", _i);
    }

    int _i;
}

void main()
{
    foreach(e; [S(0), S(1), S(2)])
    {
    }
}

prints

copied: 0
copied: 1
copied: 2

If non-copyable types were allowed, then no function which accepted a range
would be allowed to copy front without first checking that front was
copyable (something that most code simply isn't going to do), and some
functions outright require being able to copy front. They could test for
copyability, but that would complicate a lot of range-based functions for a
relatively rare corner case (much as it obviously matters to any code that
has that corner case).

Remember also that a large percentage of ranges that don't wrap dynamic
arrays put their elements on the stack, which means that whenever the range
is copied, the elements are copied. e.g. if the above example uses
std.range.only instead of an array literal, it actually prints

copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2

I'm not sure why it ends up printing each of them 3 times rather than 2, but
the fact that foreach copies any range that it's given means that in order
to have ranges allow non-copyable types, we'd have to change how foreach
worked, which would break a lot of code. Right now,

foreach(e; range)
{
}

is lowered to something like

for(auto __range = range; !__range.empty; __range.popFront())
{
    auto e = __range.front;
}

That requires both copying the range and copying front. We could
theoretically change it so that everywhere that e was used, it would be
replaced with __range.front so that front wasn't copied and probably
wouldn't break code in the process (though it could affect performance), but
if we made it so that the range itself wasn't copied, then instead of
iterating over a copy, foreach would consume the original range, which would
definitely break a lot of code.

Now, generic range-based code really shouldn't ever use a range after it's
been copied, since whether mutating the copy affects the original depends on
the implementation of the range (and thus generic code should assume that
foreach may have consumed the range and call save if it doesn't want that to
happen), but lots of code does it anyway, and if the code isn't generic, it
can work just fine, since then the code can depend on the semantics of that
specific range type instead of having to work with the semantics of
arbitrary ranges. e.g. if code is written specifically for dynamic arrays
rather than ranges in general, then calling save to iterate over a separate
slice would be silly, but if the code doesn't currently call save, and
foreach were changed to not copy the range, then all of a sudden code like

string str = "hello world";
foreach(c; str)
{
}

would consume the dynamic array instead of iterating over a separate slice.

Also, in order for a range to work with a non-copyable type, either front
would have to return by ref or it would have to construct a new object to
return so that it could be moved rather than copied. I expect that quite a
few ranges would have problems with such a restriction, and certainly, even
if a range could have its front changed to return by ref, the range would no
longer be able to guarantee that the caller couldn't mutate the value of
front, which would cause problems in at least some cases.

In addition, it's quite possible that forcing functions to not copy front
would hurt performance. In some cases, by using front directly, you can
avoid a copy (e.g. it returns by ref), but in others, the value of front is
calculated every time it's called. So, depending on the implementation of
the range, calling front repeatedly can be more expensive than simply
copying the value and reusing it. As such, it's not necessarily a good idea
to force range-based code to not copy front. Also, it's not uncommon for
ranges to store the value of front (e.g. it's calculated every time that
popFront is called), and that wouldn't work with non-copyable types. Also,
if such ranges were changed to calculate the value in front to then work
with non-copyable types, whatever calculation they were doing would then
have to be done every time front was called instead of being guaranteed to
only happen once, which would harm performance in many cases.

So, while aside from the issues with foreach, some specific cases might work
with non-copyable types, it causes a lot of complications for ranges in
general, and best case, we'd end up with most existing range-based code
failing to compile with non-copyable types, and we'd have to go to a bunch
of extra work in Phobos to either rework stuff so that it would work with
non-copyable types or make it test for copyability. Either way, most
range-based code outside of Phobos would almost certainly continue to fail
to compile with non-copyable types. We have enough problems already because
of save frequently not being used correctly and how badly reference type
ranges interact with everything because of that. Having to support
non-copyable types would then add yet one more thing that many range
implementations would fail miserably at and that solid range implementations
would have to test for.

And of course, I don't see how we could support non-copyable types even if
we wanted to because of the issues with foreach. Just like with save, we
could take a different approach if we didn't care about breaking code, but
there's no way that we could make a change like that unless we could find a
clean deprecation path to change the behavior, and I have no clue how we
could do that with foreach. Either way, a really compelling case would have
to be made as to why all of the changes required to support non-copyable
types would be worth it, especially considering that this would affect
foreach rather than just a stray function or two in Phobos and as such would
potentially affect almost every D program in existence.

IIRC, when this has come up previously, Andrei ruled that supporting
non-copyable types simply wasn't worth the extra complication, though a
quick search doesn't turn up anything where he talked about it. So, I can't
verify that at the moment.

- Jonathan M Davis