Range Redesign: Empty Ranges

Mon Mar 4 21:29:40 UTC 2024

Okay. A few weeks ago, I made a post about potential design decisions when
updating the range API in the next version of Phobos, and this is a
continuation of that, albeit with a different set of problems. Specifically:

1. The range API does not currently - and cannot currently - require or
guarantee that the init value of a range is valid.

2. The range API provides no way (other than fully iterating through a
range) to get an empty range of the same type from a range unless the range
is a random-access range.

Fixing the first issue is straightforward, I think, though it potentially
affects the second, and of course, someone here may have thoughts or
insights which I've missed.

The issue with init is that of course code wants to use the init value, and
code often _does_ use the init value for specific ranges, but doing so when
that specific range type does not specify that the init value is valid (and
documentation almost never does) is relying on unspecified behavior - and in
the general case, using the init value of a range cannot be specified
behavior. The obvious case is ranges which are classes, because their init
value is going to be null, meaning that using the init value of a range
could result in a segfault. Of course, _most_ range-based code doesn't use
classes for ranges, so I expect that it's quite rare to hit that problem in
practice, but most range-based code uses Voldemort types and tends to ignore
the fact that the init value can still be used (either by using typeof on
the result to get its type or by getting init directly from the result -
e.g. range.init). And you actually need to do that when storing a range as a
member variable. E.G.

struct Foo
{
    ...
    auto _range = typeof(chain(func1(), func2(), func3())).init;
    ...
}

The more correct thing to do would be something like

struct Foo
{
    ...
    auto _range = Nullable!(typeof(chain(func1(), func2(), func3())));
    ...
}

so that you can avoid using the range's init value, but realistically, I
don't think that it's something that many folks think of. They use the
range's init value, and it works (since it often does work), and then they
end up relying on how that range type happens to work at the moment, which
could change in the future, since the range API does not guarantee that it
works, and the folks maintaining that range probably aren't thinking much
about the init value anyway - especially when it's a Voldemort type. The
typical expectation is that the user will be creating the range by calling
the function that returns it, not by using type introspection to get their
hands on the init value. But it is always possible to get the init value,
and code declaring member variables from ranges will use it unless it goes
to the extra effort of using something like Nullable so that it's possible
to store the range without using its init value as the default value.

Now, I think that the fix for this particular problem is pretty easy. We
just require that all ranges have a valid init value. This means disallowing
classes or pointers being used directly as ranges, but we were already
looking at doing that for forward ranges anyway in order to be able to get
rid of save. It's arguably more problematic for basic input ranges, but it's
quite feasible there as well. The previous thread discussed requiring that
they be reference types, which would lean towards them being pointers or
classes, but you can have structs which are reference types (e.g. they
contain a class reference but then have the logic around it to ensure that
it's not used when null). Also, it was proposed in that thread that we
require that basic input ranges be non-copyable, which would also make it
straightforward to require that the init value be valid, since in that case,
all basic input ranges would have to be structs regardless of what their
internals were doing with pointers or references. Either way, we can
disallow classes and pointers being used directly as ranges and require that
all ranges have a valid init value.

So, I'm proposing that we require that all ranges have a valid init value. I
suspect that the issue will still often be forgotten, particularly since
ranges are often Voldemort types, but if the range API explicitly requires
it, and the range documentation is clear on that point, then we can clearly
consider it a bug when a range has an invalid init value, and in most cases,
I wouldn't expect it to be a big deal to fix.

The second issue is then the ability to get an empty range from a range type
in O(1) which is the same type as the original range. In my experience, it's
sometimes the case that you really want to be able to do something like

range = Range.emptyRange;

or

range.makeEmpty();

in order to make a range empty without eating up CPU cycles calling popFront
until empty is true. This is possible with a random-access range by slicing
it with a zero length, but it's not possible with lower classes of ranges.
std.range.takeNone tries to do it, but it is forced to give you a new range
type if the range isn't a random-access range - specifically, it uses
takeExactly, but regardless, that doesn't work in cases where you need to
keep the range type the same but make it empty (e.g. because you're dealing
with a ref parameter). And right now, that means calling popFront over and
over again.

Of course, if you're dealing with an infinite range, then there can't be a
way to make it empty, but IMHO, we should have a way to get an empty range
from an arbitrary finite range type where the empty range is of the same
type. And for that, I see three possibly options:

1. Make it so that for finite ranges, the init value of a range is required
to not just be valid, but it's also required to be empty.

2. We add a static / enum member to the range API called something like
emptyRange where you can get an empty range from that range type where the
range is that range type.

3. Add a new member function to the range API called something like
makeEmpty which makes that particular instance empty.

My gut reaction is to go with #1, because it doesn't involve any new
functions, and in general, the init value probably should be empty. However,
my concern there is that there may be ranges where that's going to constrain
the implementation in annoying ways. All finite ranges have to have an empty
state, because you'll get to empty if you call popFront enough times, so it
should be possible for all ranges to have their init value be empty, but I
could also see it being the case where requiring that would mean that the
implementer of the range would have to jump through some annoying hoops to
make it work - though in most cases, if they have to jump through hoops, I
would expect that they'd have to jump through most (or all) of those hoops
to make the init value valid whether it's empty or not. But I would like
feedback on how reasonable folks think it would be to require that a range's
init value be empty.

Between #2 and #3, I think that #2 is more desirable, because then you can
get an empty range without having to create one with elements first, whereas
if we went with makeEmpty, then you'd need to construct a range first just
to make it empty. That being said, implementing emptyRange (or whatever we'd
call it) wouldn't be all that different from making the init value empty.
The main difference I see is that emptyRange wouldn't have to be statically
known, which might be useful in some cases, though the kinds of cases that I
can think of tend to be ones where making the init value valid would require
additional logic, in which case, in going to the effort of making the init
value valid, you might as well make it empty anyway.

I expect that for some ranges, makeEmpty would be easier to implement, and
it would probably also work better if we decide to make basic input ranges
non-copyable, since mutating a non-copyable range would be more
straightforward than moving an empty one on top of it. But I would think
that we could make the latter work, and in the general case, being able to
have an empty range for any finite range in O(1) would be nice to have
without having to create a range just to make it empty.

Either awy, one advantage to having emptyRange / makeEmpty over using the
init value as empty would be that while adding a new function would be
annoying, it would force anyone implementing a range to take into account
the fact that they need to provide a way to get an empty range. In many
cases, it may just be

    alias emptyRange = typeof(this).init;

but range implementers would be forced to think about it, whereas if we just
made it so that the documentation said that they had to make the init value
empty, they'd be less likely to notice. But it's still extra stuff that has
to implemented which would be completely unnecessary with ranges that would
have empty init value anyway.

A related issue is that if we're removing save from forward ranges, and we
keep the rest of the functions the same, some of the old basic input ranges
then look like the new forward ranges (since they wouldn't have save, and
forward ranges wouldn't require it), which would be bad, because they
wouldn't have the correct semantics. To fix that, we either need to add
something to the range API (be it a whole new function like emptyRange or
something like an enum that flagged a type as being a forward range), or
we'd have to change the existing functions (be it something as simple as
changing empty to isEmpty or something more complicated like switching from
front/popFront to head/tail, which would also allow us to get around the
tail-const problem but would be a pretty massive change to how ranges work).
Adding emptyRange or makeEmpty _almost_ fixes that problem, but since
infinite ranges wouldn't have have it (since they couldn't implement it), it
wouldn't actually fix the problem. Some old basic input ranges which are
infinite would still pass the new isForwardRange.

So, if it weren't for the infinite range issue, I'd see the fact that it
would change the API enough to distingush between the old and new APIs as a
good reason to go with emptyRange over requiring that the init value be
empty, but since it doesn't fix the issue with infinite ranges, I don't
think that it makes sense to go with emptyRange / makeEmpty just to
differentiate between the old and new range APIs (it would make it more
obvious to programmers using these ranges, which could be valuable - but
then again, having isEmpty instead of empty would do the same).

So, hopefully that makes the situation with those two issues clear. The
question then is what insights you folks might have on them and the proposed
solutions. It may even be that it's way too onerous to require that the init
value be valid, and someone has a use case to show that, though I very much
hope that that is not the case. I expect though that the bigger question is
whether init should be required to be empty for finite ranges or whether it
would be better to add a new function for that.

Thoughts?

- Jonathan M Davis