How is chunkBy supposed to behave on copy

Fri Mar 20 23:50:26 UTC 2020

On Friday, March 20, 2020 4:15:18 PM MDT Ben Jones via Digitalmars-d wrote:
> On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:
> > On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via
> >
> > Digitalmars-d wrote:
> >> [...]
> >
> > [...]
> >
> >> [...]
> >
> > Yes, that's right.  Actually, for by-value ranges the act of
> > passing them as an argument to a range function in the first
> > place already copies them.  The catch is really that once this
> > happens, the caller or whoever retains the original copy should
> > no longer assume that the range remains in the same place as
> > before.  For some ranges this is true, but for other ranges
> > this assumption is invalid, and will lead to incorrect results.
> >
> >> [...]
> >
> > [...]
> >
> > +1.
> >
> >
> > T
>
> So range "copy" is really what a C++ person would call range
> "move" ?  It might be a copy, or it might invalidate the
> original, depending on the type?

You more or less have to view it that way, though no move is actually taking
place. The problem is that the semantics of what happens when a range is
copied are completely implementation-dependent, making it so that you cannot
depend on theem and thus basically have to consider the range to be in an
invalid state once it's been copied even if it's not technically in an
invalid state.

If a range has by-value copying semantics, then when you copy it, you get
the same as save. If it's a full-on reference type, then mutating the copy
mutates the original. And worst of all, if you have a pseudo-reference type,
then you can end up in a state where mutating the copy mutates only part of
the original, effectively putting it in an invalid state. But even if you
somehow never had to worry about pseudo-reference types, the mere fact that
some ranges have by-value copying semantics whereas others are full-on
reference types makes it so that you can't rely on what happens to a range
once it's been copied. And if code is not being at minimum tested with both
value-type ranges and reference-type ranges, the odds are _very_ high that
it won't handle ranges that aren't value types correctly.

Really, forward ranges should all have by-value copying (thus requiring that
classes be wrapped in structs if they're going to be forward ranges), and
save should be abolished, but that requires a major redesign and likely
would only happen if we did some sort of Phobos v2 (as has occasionally been
discussed). And exactly what should happen with basic input ranges is not
clear, because while ideally, you'd just require that they have full-on
reference semantics, that tends to mean that you're forcing them to be
allocated on the heap, which isn't really the sort of thing that we want to
force if we can avoid it. So, while it's clear what we should do with some
aspects of the range API if we have the opportunity to redesign it, there
are still issues that would have to be sorted out.

Regardless, as things stand, you can't rely on the semantics of copying a
range and basically have to consider that a range has become invalid once
it's been copied. Unfortunately, it's not something that seems to be well
understood and is often handled incorrectly in code. I've been pointing it
out for years (including in my talk at dconf 2015), but we haven't done a
good enough job in general messaging how the range API works, and this is
one of the details that seems to be very easily missed.

- Jonathan M Davis