Range Redesign: Copy Semantics

Tue Jan 23 03:24:21 UTC 2024

On Monday, January 22, 2024 5:57:49 PM MST Paul Backus via Digitalmars-d 
wrote:
> On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis
>
> wrote:
> > In practice, basic input ranges don't work with the vast
> > majority of algorithms anyway. Some do (e.g. map), but pretty
> > much anything that needs to do more than a simple
> > transformation on the elements ends up needing a forward range.
> > I really don't think that we're going to lose much by forcing
> > basic input ranges to be separate.
>
> Here's a very rough approximation of the number of algorithms in
> Phobos that work with basic input ranges:
>
>      $ grep 'isInputRange' std/algorithm/*.d | wc -l
>      153
>
> For comparison, here are the other range types:
>
>      $ grep 'isForwardRange' std/algorithm/*.d | wc -l
>      141
>      $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l
>      42
>      $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l
>      62
>
> So, I would say that your hypothesis is not supported by the
> data, in this case.

I will admit that the number is higher than I would have expected, but most
of those algorithms are on the simple side where the algorithm itself
doesn't care whether the range is consumed, whereas the caller very much
cares. They were written to accept basic input ranges, because the algorithm
itself will work with a basic input range, but that doesn't mean that it can
actually be used with basic input ranges to do anything useful. For
instance, you can use startsWith on a basic input range, but you pretty much
never would, because then you've consumed it - and actually, it's not even
safe to use the rest of the range in generic code after passing it to
startsWith, because it was copied, and the original could be in an invalid
state. The same goes with any function that takes an argument and gives you
a result that doesn't wrap the input range and return it or return a
partially iterated range.

In my experience, it's very rare that it's possible to make anything that
isn't incredibly simple work with a basic input range due to the fact that
looking at it consumes it and makes it impossible to then do anything useful
with it. And it's easy to write simple algorithms which will technically
work with a basic input range but which aren't actually useful with a basic
input range.

So, sure, a surprising number of the algorithms in std.algorithm work with
isInputRange, but a number of them really shouldn't be used with basic input
ranges. Now, a few of them might be more useful if you could rely on a basic
input range having reference semantics, but their composibility in general
is pretty poor.

Either way, I think that the gain from being able to require specific copy
semantics for both forward ranges and basic input ranges is worth it. And
since such a requirement couldn't actually be enforced at compile time
unless we made it impossible to have basic input ranges that were structs
(which would then make reference counting impossible), then a specific
algorithm could choose to add a function to forward ranges via UFCS and
support them if it didn't care about the copy semantics, but like with
@trusted code, I think that it should then be up to the programmer to make
sure that it does the right thing so that basic input ranges can be required
to have reference semantics.

> Foreach works if the elements are rvalues, or the elements are
> lvalues and you use ref for the loop variable. This is the
> correct, expected behavior.

Well, I couldn't get it to work when I tried, but either way, it means that
you can't use foreach on a range of non-copyable types in generic code,
because the way that you use foreach would have to be the same regardless of
the elment type. Of course, you could use static if and branch the code
based on whether the types are copyable or not, but that's yet another
complication to range implementations which are already often far too
complicated with too many static if branches.

> > trying to support non-copyable types will actively make
> > range-base code worse in many cases, because it will require
> > calling front multiple times, and many range-based algorithms
> > do real work in front rather than simply returning a value
> > (e.g. map works that way).
>
> Have you ever actually written generic code that works with
> non-copyable types? I have, and it mostly involves (a) passing
> things by ref, (b) using 'auto ref' and core.lifetime.forward a
> lot, and (c) checking isCopyable!T before trying to copy things.
>
> I have no idea why you think this would require calling front
> multiple times, or doing extra work. Perhaps you could try
> writing out some example code to clarify?

If you can't rely on front returning something that you can copy, then
you're going to have to call front multiple times any time that you need to
access front multiple times - either that or call move on it, which you
can't necessarily do safely, because that would alter the range. Algorithms
that don't have to look at front more than once don't have that problem, but
plenty of algorithms need to do multiple things with front. And often, it's
much better to just store the result of front rather than call front
multiple times, because it's often the case that front does actual work that
you'd rather not do over and over again (and it's particularly bad when you
have stuff like the result of map where the transformation function it was
given allocates on the heap).

So the result of having to worry about non-copyable types is that algorithms
that need to access front multiple times are either going to end up calling
front multiple times, harming performance in many cases, or they're going to
have static if branches to change what they're doing based on the return
type of front, which is going to make them that much more complex to
implement.

> > Personally, I think that we should make it very explicit that
> > non-copyable types are not supported by ranges. They don't work
> > with them now, and I don't think that it's even vaguely worth
> > it to try to make ranges work with them. The cost is too high
> > for too little benefit.
>
> First, as a general rule, I don't think that we, as
> standard-library authors, should be in the business of telling
> our users that their use-cases are illegitimate just because it
> makes our jobs more difficult.
>
> Second...how do you know the benefit is too little? You thought
> earlier that "the vast majority of algorithms" didn't work with
> basic input ranges, even though the actual data shows that isn't
> true. Unless you have concrete evidence, I think it is better to
> avoid making assumptions about which use-cases are important and
> which aren't.

IMHO, it's already way too complicated to write correct range-based code,
and adding non-copyable types into the mix is very much a step too far.
Obviously, there can be disagreement on that, but it's already very
difficult to write range-based code that actually works properly with all
types of ranges. You have to test it with a ridiculous number of types, and
most people just don't do that. Phobos often doesn't do that. And every
capability that we add to ranges means even more static if blocks and even
more variations that need to be tested. The only way to fix that is to
simplify things by putting greater restrictions and requirements on how
ranges work, whereas making them work with a wider variety of things is
going in the opposite direction of that.

Realistically, the best that even might happen here for folks who want
ranges to work with non-copyable types is that a subset of the algorithms in
Phobos will be made to work with them, but third-party libraries are not
going to go to that level of effort (and I don't think that it's all
reasonable to tell them that they should). And Phobos' range-based code will
become even more complicated and that much harder to maintain and make sure
that it works correctly. And we already have too much trouble making sure
that Phobos' range-based code works correctly.

So, my take on it is that supporting non-copyable types is simply not worth
the effort given that it's going to complicate the code and the tests that
much more, and they're already too complex.

And so, if it's my choice, then we simply won't support non-copyable types
with ranges. Obviously, I'm not the sole voice in this, and ultimately, the
choice is going to be up to Walter and Atila (probably Atila given that he's
supposed to be the one in charge of Phobos), so I could definitely be
overruled on this, but I very much want to see it become easier to write
correct range-based code, not have it become harder.

- Jonathan M Davis