Requirements for Slicing Ranges

Tue Sep 3 00:33:28 UTC 2024

On Monday, September 2, 2024 3:59:04 AM MDT Sebastiaan Koppe via Digitalmars-d 
wrote:
> On Monday, 2 September 2024 at 02:47:16 UTC, Jonathan M Davis
>
> wrote:
> > Does anyone know of a situation where that would not be the
> > case? I would love to just ditch the test for slicing in the
> > updated range API and tie the ability to random-access ranges,
> > since it would be simplify things. If I had to guess without
> > digging through all of the history, I suspect that
> > random-access ranges were added before slicing was and that
> > that's why they weren't tied together, but in practice, they
> > always seem to be, and I can't think of a reason at the moment
> > why they can't be.
> >
> > Can anyone see something that I'm missing here?
> >
> > - Jonathan M Davis
>
> I briefly wondered how it would work if the range is
> non-copyable, since taking a slice is effectively taking a copy
> of the underlying data the range captures (and its lifetime). But
> I remember you saying in another thread that forward ranges have
> to be copyable, correct?

Slicing doesn't copy the underlying data. It just gives a range which refers
to fewer elements from the original where the new range can be iterated
independently. Just think about dynamic arrays / slices (which is what the
range API is attempting to create an abstraction of). If you do something
like arr[5 .. 12], you get a new dynamic array which is a slice of the
original dynamic array starting at index 5 and going through index 11. Each
dynamic array / slice refers to the same memory, and none of the elements
were actually copied. The same occurs when slicing a range. You get two
ranges which can be independently iterated, but the elements themselves are
not necessarily independent (they will be if the range is a value type, but
in most cases, they'll refer to the same memory; it's just the iteration
state which is guaranteed to be independent).

Similarly, save gives you a range which is independent with regards to its
iteration state. Both the copy and the original can be iterated without
affecting the other, but the elements themselves are not necessarily copied
(though in some cases they are), and mutating the elements in one will often
mutate the elements in the other. If you want to make sure that all of the
elements are copied, then you'd use something like std.array.array to copy
all of the range's elements into a dynamic array.

Essentially, save is equivalent to range[0 .. $], but there are plenty of
ranges that can support copying their full iteration state but which can't
support copying their iteration state while skipping a bunch of elements.

With the proposed API, copying ranges in general then becomes essentially
equivalent to save (though an implementation could use reference counting
and avoid actually doing save internally until the range is mutated in order
to avoid copying its state in the cases where there's only one copy). So,
a non-copyable range would be equivalent to a basic input range - and in
general, such a range is not a forward range precisely because it can't
implement save in O(1). And if a range can't implement save in O(1), then it
can't implement slicing in O(1).

So, if a range has to be a basic input range, then it makes no sense for it
to have slicing - and the current API does not allow such a range to have
slicing, since hasSlicing checks for isForwardRange. My issue is that as far
as I can tell, if a range can implement slicing in O(1), then it should be
possible for it to implement indexing in O(1) (and vice-versa), in which
case, we might as well simplify things and just have isRandomAccessRange
require slicing and not support ranges which have slicing but not
random-access with the idea that there's no reason why a range that can
support one can't support the other.

> So I guess the question for me boils down to whether there exists
> a need for a non-copyable range that provides random access and
> doesn't allow copies. (Although I guess with ref and dip1000 you
> could make that work, but lets not go there :smile).

If a range cannot implement save - or with the new API, cannot be copyable -
then that's because it cannot copy its iteration state in O(1). I suppose
that there could be cases where it's technically possible to implement save
/ copying in O(1), but the programmer doesn't want to (e.g. that would
arguably make sense with random number generators, since copying those
accidentally can cause result in reusing numbers). However, in the normal
case, a range doesn't implement save (or isn't copyable in the new API),
because it can't do it in O(1).

If a range can't copy its iteration state in O(1), then slicing won't work
(since that needs to be O(1), and it's copying the iteration state), and in
such a case, I'm pretty sure that it's not possible to implement indexing in
O(1) - though maybe someone can come up with a corner case where it is
possible. But as long as the ability to implement indexing in O(1) implies
the ability to implement slicing in O(1) (and vice versa), it would make
sense to just tie random-access and slicing together in order to simplify
the API.

I guess that the question then is ranges which _could_ copy their iteration
state in O(1) but where the programmer decided to not allow it for one
reason or another. Sure, then it could be possible to implement
random-access in O(1), since in that situation, the lack of copyability has
nothing to do with the ability to implement it, but as things stand,
bidirectional ranges are always forward ranges, and finite random-access
ranges are always bidirectional ranges (whereas infinite random-access
ranges are forward ranges but can't be bidirectional ranges for obvious
reasons).

So, to allow a non-copyable range to be be bidirectional or random-access
would make it so that it's no longer the case that bidirectional ranges and
random-access ranges are guaranteed to be forward ranges. And we _could_
technically do that, but that would certainly complicate things, and it
would be for a pretty rare use case. Also, I'm pretty sure that it's the
case that most algorithms which require access to the end of a range or to
elements in the middle of a range also require the ability to copy the
iteration state of a range. There will be exceptions, but I'm not sure that
they're worth the trouble of further complicatin the range API.

The question for which I created this thread was based on the assumption
that a range would implement all of the functionality that it could. If it
doesn't, then it's certainly the case that a range _could_ implement slicing
and not be random-access - just like it's possible to have a range that
could implement save but which doesn't.

So, yeah, the situation changes considerably if we don't treat the range
categories as being a linear hierarchy, but I'm inclined to keep the
hierarchy as-is, because it's much simpler that way, and I expect that the
need for something like a random-access range which can't be a forward range
would be quite rare. The whole reason that I'm asking if anyone can think of
an example where it would be possible to implement indexing but not slicing
(or vice versa) is in order to be able to simplify the range API by getting
rid of hasSlicing, whereas if we allow basic input ranges to be
bidirectional or random-access, then those traits become more like
hasSlicing, and the hierarchy is lost, which makes the whole situation more
complicated rather than simpler.

And in cases where you have something on the stack which you don't want to
copy, then you have basically the same situation as static arrays, and you
can always create a range that refers to that memory and use that rather
than having a range be non-copyable because of where it is. Sure, that then
starts getting into scope and DIP 1000 if you want the compiler to track
stuff for you, but the way that range-based algorithms typically work, the
dangers of escaping are pretty minimal, since either they take the argument
by ref and mutate the original, or they take a copy and return a copy.

- Jonathan M Davis