Range Redesign: Copy Semantics

Sun Jan 21 13:50:26 UTC 2024

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
> Ultimately, I'm inclined to argue that we should give basic 
> input ranges a new API - not just because it would allow them 
> to use reference counting, but also because the current input 
> range API tends to force basic input ranges to cache the value 
> of front (which if anything, encourages basic input ranges to 
> be pseudo-reference types and could result in an extra layer of 
> indirection in some cases if they're forced to be reference 
> types). It would be annoying in some cases to require that 
> different functions be used for basic input ranges and forward 
> ranges (though overloading could obviously be used to avoid 
> needing different names), but it's already often the case that 
> code isn't really designed to work with both, and overloading 
> on the category of range being used is already fairly common, 
> since different range capabilities allow for different 
> implementations. So, given that it would prevent whole classes 
> of copying bugs as well as potentially remove the requirement 
> to cache front for basic input ranges, I think that a separate 
> API for basic input ranges is warranted. What I would propose 
> for that would be a single function
>
>     auto next();

We could use java Iterator api in this case which has also `bool 
hasNext()` function.
Then new input range api can also be propagated to other types of 
range such as forward range and further.

> where next returns a nullable type where the value returned is 
> the next element in the range, with a null value being returned 
> if the range is empty. The return type would then need to 
> emulate a pointer - specifically, when casting it to bool, it 
> would be true if it's non-null and false if it's null, and 
> dereferencing it would give you the actual value if it's 
> non-null (with it being undefined behavior if you dereference 
> null). So, a basic input range of ints might define next as
>
>     int* next();
>
> or alternatively, it could be something like
>
>     Nullable!int next();

The returned type should be a tagged union, that can allow 
storing of actual value inside, or pointer to it, while it's 
interface will hide the details, i.e. get would look like this: 
`ref T get()`.

> though that wouldn't work with Phobos' current Nullable type, 
> since it doesn't support either casting to bool or 
> dereferencing (probably because it originally used alias this). 
> Either way, since we'd just require the specific API for the 
> return type rather than requirng a pointer, the range type 
> would have some flexibliity in what it used.
>
> This would then mean that if you wanted to loop over a basic 
> input range, you'd do something like
>
>     for(auto front = range.next; !front; front = range.next)
>     {
>         ...
>     }
>
> And if we go down this road, then we could also add this API to 
> foreach, allowing for code such as
>
>     foreach(e; basicInputRange)
>     {
>         ...
>     }
>
> to work - and unlike now, you could rely on the copy semantics 
> involved such that you would know that you could then break out 
> of the loop and continue to use the range (whereas right now, 
> you can't safely break out of a foreach and then continue to 
> use the range that you were iterating over). Of course, for

Input ranges could just be disallowed in foreach statements, that 
would solve different semantics between them and forward ranges, 
just like how in Java it is done with Stream api.

> What I'd like to get out of this thread is feedback on how much 
> sense this idea does or doesn't make and what problems I'm 
> missing that someone else is able to see. From what I can see, 
> the main negative is simply that you can't then write code that 
> works on both a basic input range and a forward range (though 
> you can obviously still create function overloads so that the 
> caller can use either), but given the issues surrounding copy 
> semantics, I think that that's probably ultimately a good 
> change (and the number of functions that can operate on basic 
> input ranges is already pretty limited anyway in comparison to 
> forward ranges - particularly with regards to generic 
> algorithms). It will also make it much easier to discuss the 
> separation between basic input ranges and forward ranges, which 
> IMHO can be too easy to lose or confuse as things stand.

Imho, this proposal is complicated, and unnecessarily complicates 
construction of ranges, making them less appealing to implement 
in user code. I'd opt for restricting copying completely, and 
allow copying through `.save` only.

The `.next` method proposal is a good improvement though, with 
addition of `.hasNext` method at minimum.