Proposed Changes to the Range API for Phobos v3

Sat May 18 14:26:18 UTC 2024

On Thu, May 16, 2024 at 08:56:55AM -0600, Jonathan M Davis via Digitalmars-d wrote:
[...]
> 1. The easy one is that the range API functions for dynamic arrays will not
> treat arrays of characters as special. A dynamic array of char will be a
> range of char, and a dynamic array of wchar will be a range of wchar.
> 
> Any code that needs to decode will need to use the phobos v3 replacement for
> std.utf's decode or decodeFront - or use foreach - to decode the code units
> to code points (and if it needs to switch encodings, then there will be
> whatever versions of byUTF, byChar, etc. that the replacement for std.utf
> will have).

I thought we already have this?  std.string.byRepresentation,
std.uni.byCodePoint, std.uni.byGrapheme already fill this need.

[...]
> However, with infinite ranges, there is no such solution. If they
> cannot be default-initialized, then they either can't be ranges, or
> they would have to be finite ranges which would just never be empty if
> they're constructed at runtime (while doing something like the flag
> trick to make their init value empty). And it's certainly true that
> the range API doesn't (and can't) guarantee that finite ranges are
> truly finite, but it's still better if we can define infinite ranges
> that need to be constructed at runtime as infinite ranges, since then
> we can get the normal benefits that come from statically knowing that
> a range is infinite.

Infinite ranges also have the peculiarity that slicing may create a
finite range, i.e., the underlying type changes. That's another wrinkle
to deal with.

[...]
> 4. All ranges must be either dynamic arrays or structs (and not
> pointers to structs either).

This is not necessarily an ideal solution.  In my own code I've often
had to iterate over forward ranges via sub functions, where I expect the
iteration state after returning from the sub function to be retained.
There are two ways to do this:

1) Have two versions of every iteration function, one taking the range
by value (with implicit saving of current iteration state), the other
taking the range by reference (retain changes to iteration state upon
return), which leads to a lot of code duplication; or:

2) Have a single version of each function and pass the range by
reference, usually by passing a pointer to it (since the current API
would transparently treat the pointer as the range itself).

Prohibiting pointers eliminates option (2), and leaves me with the
non-ideal situation (1) where I need lots of code duplication.

Although, come to think of it, we could have a .byRef range wrapper that
encapsulates a pointer to the range so that changes to iteration state
would be preserved.  But then it begs the question, why not just allow
pointers in the first place?  Why require jumping through extra hoops?

> 11. Finite random-access ranges are required to implement opDollar,
> and their opIndex must work with $. Similarly, any ranges which
> implement slicing must implement opDollar, and slicing must work with
> $.
> 
> In most cases, this will just be an alias to length, but barring a
> language change that automatically treats length as opDollar (which
> has been discussed before but has never materialized and is somewhat
> controversial given types where it wouldn't make sense to treat length
> as opDollar), we have to require that opDollar be defined, or generic
> code won't be able to use $ with indexing or slicing. We probably
> would have required it years ago except that it would have broken code
> to add the requirement.
[...]

Will this also require implementing arithmetic operators on the return
type of opDollar? Otherwise things like r[0 .. $-1] still wouldn't
work correctly. Or r[0 .. complicatedMathFunc(($-1)/2)].

--T