Range Redesign: Copy Semantics

Mon Jan 22 03:52:02 UTC 2024

On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
> wrote:
> We could use java Iterator api in this case which has also `bool
> hasNext()` function.

Well, the key change the API that would really matter IMHO would be to merge
front and popFront, since that would eliminate the need for basic input
ranges to cache front, which many of them are forced to do right now. I'd
have to think about it more to see whether having hasNext instead of
returning a nullable type from next would be an improvement, make it worse,
or be more or less equal. A lot of that depends on how it would affect the
typical implementation of a basic input range.

> Then new input range api can also be propagated to other types of
> range such as forward range and further.

Part of the point here is to _not_ do that, because they fundamentally have
different semantics.

> > where next returns a nullable type where the value returned is
> > the next element in the range, with a null value being returned
> > if the range is empty. The return type would then need to
> > emulate a pointer - specifically, when casting it to bool, it
> > would be true if it's non-null and false if it's null, and
> > dereferencing it would give you the actual value if it's
> > non-null (with it being undefined behavior if you dereference
> > null). So, a basic input range of ints might define next as
> >
> >     int* next();
> >
> > or alternatively, it could be something like
> >
> >     Nullable!int next();
>
> The returned type should be a tagged union, that can allow
> storing of actual value inside, or pointer to it, while it's
> interface will hide the details, i.e. get would look like this:
> `ref T get()`.

Requiring the pointer API already does that. It's just that checking whether
the value is null means casting to bool, and you dereference the return
value rather than calling get. The implementation would be quite free to
then use a Nullable or some other type which does whatever it wants
internally so long as casting to bool tells you whether the object contains
a value, and dereferencing it gives you the value.

> Input ranges could just be disallowed in foreach statements, that
> would solve different semantics between them and forward ranges,
> just like how in Java it is done with Stream api.

They could be, but that wouldn't be very user-friendly. And it doesn't solve
the different semantics of copying them.

> Imho, this proposal is complicated, and unnecessarily complicates
> construction of ranges, making them less appealing to implement
> in user code. I'd opt for restricting copying completely, and
> allow copying through `.save` only.
>
> The `.next` method proposal is a good improvement though, with
> addition of `.hasNext` method at minimum.

Restricting copying would make ranges borderline unusable. They have to be
able to be passed around and be wrappable, which means either copying them
or moving them, and moving them would be very un-user-friendly, since it
would require explicit calls to move calls all over the place. What we
really want to be able to do in general is treat ranges like dynamic arrays
(and part of the reason to give basic input ranges a different API is
because the can't have the same copy semantics as dynamic arrays).

The suggested API would actually simplify forward ranges considerably,
because then most of them wouldn't have to implement save any longer, and
those few that do would just implement the necessary semantics via their
copy constructor (or a more advanced implementation might use
reference-counting, which would be somewhat more complex, but the vast
majority of ranges would not be in that situation). And the result would be
that forward ranges in general could be treated just like dynamic arrays,
which would simplify the vast majority of range-based code and allow
range-based code to rely on the copy and assignment semantics instead of
having to carefully avoid using ranges which have been copied and avoid
assignment like they have to do now. So, as far as forward ranges go, this
seems to me like it's very much a simplification, not something complicated.
If this is coming across as complicated, then it's probably because of all
of the explanatory text I had to give as to why these changes should be
made. But the changes themselves simplify things for forward ranges.

The part that's arguably more complex is simply that the basic input ranges
then have to have a separate API, but since they fundamentally have
different cropy semantics and can't be used in most range-based functions
anyway, I don't think that it complicates things much. And it makes the
distinction and behavioral differences between basic input ranges and
forward ranges much clearer than they typically are now, which would make
understanding ranges easier.

- Jonathan M Davis