Range Redesign: Copy Semantics

Mon Jan 22 22:46:28 UTC 2024

On Monday, January 22, 2024 2:17:08 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
> On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis
>
> wrote:
> > I'd have to think about it more to see whether having hasNext
> > instead of returning a nullable type from next would be an
> > improvement, make it worse, or be more or less equal. A lot of
> > that depends on how it would affect the typical implementation
> > of a basic input range.
>
> Well one advantage of `hasNext` is that you can check in advance
> whether range is empty or not, without the need to pop front
> element.

true

> > Part of the point here is to _not_ do that, because they
> > fundamentally have different semantics.
>
> That would mean that you need to overload a method to accept both
> input, and forward ranges when foreach is not used, right?

Yes, though in practice, most functions tend to require forward ranges
anyway. So, while the separation will be annoying in some cases, in general,
I wouldn't expect it to be a big problem, and if anything, it will help make
the distinction between basic input ranges and forward ranges clearer, which
will probably make it easier to reason about the code. Obviously, we'll have
to actually do it to see what the impact is, but realistically, about all
you can do with basic input ranges is iterate through them once and possibly
do transformations on their elements as you do it. The fact that they can be
iterated separately from a foreach loop is beneficial, and some range-based
functions are useful with them (e.g. map), but most range-based algorithms
need forward ranges.

> > If this is coming across as complicated, then it's probably
> > because of all of the explanatory text I had to give as to why
> > these changes should be made. But the changes themselves
> > simplify things for forward ranges.
>
> Could be the case :). Perhaps it would be nice to see it as a
> code example.

Well for forward ranges, you'd basically be writing what you write now
except that you wouldn't have to worry about calling save anywhere, which
would signifcantly reduce the number of bugs that crop into range-based
code. So, about all I could really show you would be normal range-based code
that just doesn't call save, because it doesn't need to.

It would also allow you to assign one range to another (assuming that
they're the same type), which you can't do right now without relying on the
specific copy semantics of a range. But for forward ranges, the main change
here would simply be that you could assume that copying a range was
equivalent to save, which is purely a simplification. If there's a
complication, it's only that ranges which are currently reference types
would have to do something smarter than saving in their copy constructor if
they wanted to be efficient. So, they'd work with something like

struct Range(T)
{
    T front() { return _range.front; }
    void popFront() { _range.popFront(); }
    bool empty() { return _range.empty; }

    // or copy constructor
    this(this)
    {
        _range = _range.save;
    }

    private static class RefRange { ... }

    private RefRange _range;
}

but if you wanted them to be more efficient, you'd need something like

struct Range(T)
{
    T front() { return _range.front; }

    void popFront()
    {
        if(_range._refCount != 1)
            _range = _range.save; // ref count now 1

        _range.popFront();
    }

    bool empty() { return _range.empty; }

    // or copy constructor
    this(this)
    {
        _range.incRef();
    }

    ~this()
    {
        _range.decRef();
    }

    private static class RefRange { ... }

    private RefRange _range;
}

So, there would be some extra complication for reference type forward
ranges, but most forward ranges already do the equivalent of calling save
when they're copied, which is a large part of why you frequently end up with
bugs when you try to use a range that needs you to call save explicitly.

- Jonathan M Davis