Range Redesign: Copy Semantics

Mon Jan 22 22:24:59 UTC 2024

On Monday, January 22, 2024 8:41:35 AM MST Atila Neves via Digitalmars-d 
wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
>
> wrote:
> > I've been thinking about this for a while now, but with the
> > next version of Phobos which is in the early planning stages,
> > we really should do some redesigning of ranges. Most of their
> > API is just fine how it is, but there are some aspects of it
> > which really should be changed if we want them to be better
> > (the most obvious one being the removal of auto-decoding). But
> > what I'd like to discuss specifically in this thread is fixing
> > - and defining - the semantics of copying and assigning to
> > ranges. Specifically, the semantics of stuff like
> >
> > [...]
>
> I don't think I've ever encountered a situation where reference
> ranges would have been desirable - I've never used one.
>
> I think that `.save` was a historical mistake, and that ranges
> that can be copied are forward ranges. Something like a range
> reading from stdin or a socket would/should disable the
> copy/postblit constructors.

Once you disable the copy/postblit constructor, you can't generally pass the
range around or wrap it in another range - at least not without explicit
calls to move. ref parameters and return values would solve some of that,
but it wouldn't solve all of it (wrapping in particular, which ranges do
heavily, won't work with ref). I would expect a range that can't be copied
to be too annoying to even bother with. And in fact, they can't even work
with foreach, because foreach copies the range that it iterates over (and
has to in order to have the semantics match what hapens with dynamic arrays,
though if we separate basic input ranges from forward ranges, then we can
change what happens with basic input ranges). IMHO, if we want to go the
route of trying to make them uncopyable, we pretty much might as well just
get rid of the concept of basic input ranges and require that they use
opApply (which isn't an entirely bad idea given how limited basic input
ranges are in practice anyway).

Basic input ranges are inherently either reference types or pseudo-reference
types, and they can be made to work cleanly if we just require that they be
reference types. Those are very different semantics from forward ranges,
which are value types with regards to their iteration state (assuming that
we require that copying them results in copies which can be independently
iterated rather than having save). And that's why I'm arguing that we should
split the two concepts rather than trying to treat forward ranges like an
extension of basic input ranges - especially since basic input ranges are
extremely hamstrung anyway, because you can't actually get an independent
copy of them, and in practice, most stuff needs that. And if we did go with
a solution for basic input ranges which made them non-copyable, then they
couldn't be used with the same code that uses forward ranges anyway, so we
might as well just give them a different API and stop treating them like the
same thing when they really aren't.

> Has anyone here used the class-based ranges?

Yes. Symmetry uses them. In general, it's best not to use them for forward
ranges, but when you're dealing with code where you can't determine the type
at compile time (like with an interpreter), then you need some some kind of
runtime polymorphism, and you really don't have much choice. That being
said, we can support that by wrapping such classes in structs and ensuring
that the structs have the correct copy semantics rather than allowing
forward ranges that are classes. So, it should be quite possible to get rid
of save without losing any functionality, and we'd get much cleaner copy
semantics in the process, making it so that forward ranges in general behave
more like dynamic arrays.

For basic input ranges, it should be perfectly fine for them to be classes -
desirable even - simply because you can't actually get independent copies of
them, and using a struct with pseudo-reference copy semantics is just
begging for bugs, because it means that mutating the copy puts the original
in an invalid state, whereas if we can require that all basic input ranges
be full-on reference types (which could include structs and pointers to
structs so long as they have the correct semantics), then we can actually
rely on sane, consistent copy semantics for basic input ranges. It's just
that they would be different from the copy semantics of forward ranges -
which would also be true if we were somehow able to make it work to make
basic input ranges noncopyable instead, though I really don't think that
that will work.

- Jonathan M Davis