Proposed Changes to the Range API for Phobos v3
Jonathan M Davis
newsgroup.d at jmdavisprog.com
Thu May 16 14:56:55 UTC 2024
Okay. This is my proposal for an updated input range API for Phobos v3. I
previously sent it out to some or the core D folks, and the feedback has
largely been positive, but I'm now posting it to the newsgroup for wider
feedback.
And yes, this is long, but there's a lot to cover, and I want both what's
changing and why it's changing to be clear. Not that anyone expects me to be
short when writing something like this up anyway...
===
This is a proposal for an updated input range API for Phobos v3.
Input ranges as defined in Phobos v2 have worked extremely well for us thus
far, but experience has shown that there are some problems with them, and we
would like to address and fix them with Phobos v3. So, this proposal
attempts to leave input ranges the same as much as reasonable while making
the necessary changes to fix the known problems.
Output ranges will be covered in a separate proposal.
A Note on the Enforcement of Range API Requirements
---------------------------------------------------
As a general note with regards to any requirements that the range API has
(be it the current version or the new one), some requirements can be and are
enforced with the traits for ranges (e.g. isInputRange requires that front
result in a value and will be false if front is void), whereas other
requirements cannot actually be enforced (e.g. that front returns the same
value each time that it's evaluated until popFront is called). As such, the
range API statically enforces what it can but can't enforce everything.
This means that when the range API places a requirement on ranges and their
behavior, and that particular requirement cannot be statically enforced, it
_is_ possible to write a type that violates the range API but passes all of
the associated traits and template constraints. Such a type is considered to
be in violation of the range API, and there are no promises that any
range-based function will behave correctly with such a range (since the
range itself does not have the required behavior). Range-based code in
general is free to assume that any ranges that it's given correctly
implement the range API, and it really doesn't make sense for the situation
to be otherwise.
As such, when this proposal says that a range is required to behave in a
certain way, that does not necessarily mean that that behavior is statically
enforced (just like not all requirements of the current range API are
statically enforced). They will be enforced where practicable, but in some
cases, it simply means that it's a requirement that any ranges must meet if
they're going to work correctly with normal range-based code, and anyone
writing range-based code can assume that those requirements have been met.
As is the case with many things, testing will then be required to make sure
that the code is working correctly with regards to anything that couldn't be
statically enforced.
Most of the changes in this proposal revolve around being stricter with the
behavioral requirements for the range API so that range-based code can
actually rely on certain behaviors of ranges (like their copy semantics),
whereas the current range API has been lax on a number of aspects of the
semantics of ranges, making it impossible for generic range-based code to
rely on those semantics.
Now, on to discussing what actually needs to be changed...
Some of the Problems with Input Ranges
--------------------------------------
1. Auto-decoding has not solved the problems that it was intended to solve
(if nothing else, because graphemes are a thing, and auto-decoding does not
take that into account), and it has caused both performance problems and a
proliferation of complications (in particular with all of the code that
attempts to work around auto-decoding to regain performance and avoid having
strings be wrapped by other range types where possible). So, we clearly
would like to get rid of it.
2. Requiring save for forward ranges has been a source of bugs. Most forward
ranges do not require it, so it is routinely forgotten, causing bugs for any
ranges that do require it. Ideally, we would be able to just rely on copies
of forward ranges being independent and thus not need save at all. It would
also bring the semantics of ranges that much closer to dynamic arrays /
slices, which ranges are effectively trying to be a generalization of, just
like iterators in C++ are effectively a generalization of pointers.
3. The semantics of copying from and assigning to ranges are not defined by
the current range API, and because ranges allow such a variety of types with
such a variety of semantics for copying and assignment, we cannot actually
define the semantics of copying or assignment for ranges with the current
API. This puts some annoying restrictions on writing correct generic code
(e.g. not being able to use a range once it's been copied), and ideally, we
would be able to actually define the semantics of copying and assignment so
that generic code can rely on it.
4. There is no guarantee that the init value of a range is even valid - or
that a range can be default-initialized. This causes problems in a number of
cases but in particular when it's necessary to store a range as a member
variable. One recent example of this is
https://issues.dlang.org/show_bug.cgi?id=24202 where a project incorrectly
relied on std.range.chain's init value being both valid and empty (which it
was previously but then wasn't after some changes were made to chain to
improve its performance). As things stand, code that wants to be able to
store a range as a member variable and to do so completely correctly needs
to do something like wrap it in a std.typecons.Nullable to ensure that the
init value isn't actually used, and of course, no one actually does this or
even really thinks about it. In practice, they end up relying on the
behavior of whatever ranges their code currently happens to use, and they
almost certainly don't realize that they're relying on unspecified behavior
that could change in the future. Ideally, we would be able to rely on the
init value being valid - and preferrably empty (at least for finite ranges)
- which is not possible with the current range API.
5. The range API does not provide any way to get an empty range from ranges
that can't be sliced other than iterating through the entire range (ranges
that can be sliced allow it by using the slice operator with indices that
make the length 0). takeExactly allows us to make an empty range from any
range, but the type changes in the process, which does not work in any
situation where code needs to make a range empty but cannot change the type
(e.g. if it's dealing with a ref parameter or a member variable and needs to
set that variable to an empty range).
6. The range API does not currently specify what happens if you copy front
and then call popFront. For the vast majority of code, it is both assumed
that you can copy front and that the copy of front will be unaffected by
calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
which will do things like use a dynamic array as a buffer which has its
elements overwritten in popFront, resulting in the elements of the copy
being mutated, since it's a slice of the same data. Such ranges are
sometimes referred to as having a transient front, and they do not work with
a lot of range-based code. As such, they either need to be disallowed, or we
need a way to detect such behavior so that range-based code can check for
it.
7. Related to the issue of transient front is the fact that range-based code
in general tends to assume that front is copyable, meaning that it doesn't
work with non-copyable types. This isn't a problem with the range API per
se, but we probably do need to better formalize what you can assume you can
do with front vs what you have to test for.
8. $ cannot be used with random-access ranges in generic code, because the
range API does not require that a random-access range define opDollar.
9. const makes ranges useless. Realistically, a const range is not a range,
because you can't call popFront, and you can't iterate through it
(technically, it _is_ possible to have popFront be const in cases where the
iteration state doesn't actually live in the range, and popFront isn't pure,
but normal ranges do not work that way).
Most range-based code doesn't actually need to worry much about const, since
it's just iterating through the range, and the constness of the elements
often doesn't even matter, since the code is templated and works regardless
of the mutability of the elements. However, once a range is stored in
another type, constness can quickly become a problem. This not a problem
with dynamic arrays / slices, because they can be sliced to get a tail-const
slice of the original, and templates are actually instantiated with the
tail-const type for dynamic arrays, but other range types do not have such
compiler magic at present.
10. While it's not exactly a problem with the range API, it would be nice if
we didn't have to import std.range / std.range.primitives (or the Phobos v3
equivalent) to use arrays as ranges. In theory, we could add the range API
functions for arrays to object.d, but doing so would conflict with the
existing functions in std.range (and bugs with selective imports actually
make it somewhat problematic to resolve those conflicts - see
https://issues.dlang.org/show_bug.cgi?id=24451). And if we're getting rid of
auto-decoding, the range API functions for arrays of characters would
change, meaning that we can't solve the problem by putting the functions in
object.d and then aliasing them in std.range.primitives to avoid problems
with selective imports from std.range or std.range.primitives.
So, the symbol conflicts which would result from putting the range API
functions for arrays in object.d do potentially place some restrictions on
what we do with the new range API.
Overview of Proposed Changes
----------------------------
1. The easy one is that the range API functions for dynamic arrays will not
treat arrays of characters as special. A dynamic array of char will be a
range of char, and a dynamic array of wchar will be a range of wchar.
Any code that needs to decode will need to use the phobos v3 replacement for
std.utf's decode or decodeFront - or use foreach - to decode the code units
to code points (and if it needs to switch encodings, then there will be
whatever versions of byUTF, byChar, etc. that the replacement for std.utf
will have). And if code needs to worry about normalization or other grapheme
stuff, then it'll need to use that functionality from the replacement of
std.uni (which is basically what ranges have to do right now anyway). We may
choose to still have UTFException, but it will be up to the programmer to
choose whether they want to use the replacement character or to throw
exceptions when decoding, because the decoding will no longer be automatic.
As such, using the range API on dynamic arrays will work with @nogc, and we
won't need a bunch of extra code just to avoid the auto-decoding.
This may result in more bugs where programmers slice into the middle of code
points, but auto-decoding just makes that harder rather than actually
preventing it, and there's really no fix for that without doing something
like operating at the grapheme level by default, which would be horribly
inefficient. We probably should have better documentation and tutorials for
how to write range-based code which deals with Unicode correctly, but
ultimately, education is the solution here, not trying to make everything
work at the code point or grapheme level by default. It may make sense to
add a string type of some kind to Phobos which operates at the grapheme
level with its API but which tries to be efficient internally, but that's
really not a question for the range API. It seems to be generally agreed
upon that arrays of code units should be treated as ranges of code units, so
that's what the new range API will do.
2. All ranges which can be default-initialized must have a valid init value
(that is, their range API functions must behave according to the normal
rules for those functions and can't do something like have empty be false
and then have front fail to return a valid value; the init value of all
ranges that can be default-initialized must work properly as ranges).
By requiring that the init value be valid, we avoid all of the problems that
we've been having with ranges which behave badly when they're
default-initialized. Instead, generic code will be able to rely on a
default-initialized range behaving sanely, and the default behavior will be
what most programmers expect, whereas right now, some ranges will actually
segfault if their init value is used (e.g. if the range type is a class).
Of course, this does make it so that classes, interfaces, and pointers can't
be ranges, but there are ways to work around that (as will be discussed
later).
Ideally, we would also disallow ranges from disabling default initialization
so that generic range-based code can rely on it, but that causes problems
with ranges that need to be constructed at runtime to function properly.
That problem can be worked around fairly easily with finite ranges, since
worst case, you can always just add a flag to them which indicates whether
they were default-initialized and then have empty return true in that case
(without having to care about what the range normally does). It's also
likely that such ranges are rare, since it's almost always the case that you
can rework the logic of a finite range to have a valid init value.
However, with infinite ranges, there is no such solution. If they cannot be
default-initialized, then they either can't be ranges, or they would have to
be finite ranges which would just never be empty if they're constructed at
runtime (while doing something like the flag trick to make their init value
empty). And it's certainly true that the range API doesn't (and can't)
guarantee that finite ranges are truly finite, but it's still better if we
can define infinite ranges that need to be constructed at runtime as
infinite ranges, since then we can get the normal benefits that come from
statically knowing that a range is infinite.
Alternatively, we could disallow finite ranges from disabling default
initialization (since they have a workaround) while allowing infinite ranges
to disable it, but that seems like it's too much special-casing and like it
would add to the confusion overall.
And maybe it would be better to just force infinite ranges which can't work
properly unless they're constructed at runtime to be finite ranges given
that they're likely to be pretty rare, but ultimately, I think that it's
more straightforward to just require that the init value be valid if a range
can be default-initialized. The issues with types that can't be
default-initialized already are something which causes problems for generic
code in general, and it would not require any special explanation with the
range API.
It would also fit in with the language better if we ever add default
constructors with a future Edition (not that I expect that to happen, but
forcing range types to have a valid init value would make that harder).
The only real problem I see is that there will likely be some generic code
which will be written to assume that default initialization works for all
ranges, but again, that's a general problem with types that disallow default
initialization, and the bug that you then get is a compilation error,
whereas forgetting something like save results in code that compiles but
does the wrong thing. So, when we do run into such bugs, they won't silently
cause problems.
Weighing the pros and cons here, I think that we're better off just letting
ranges disable default initialization (much as it's annoying in general when
types do that). Simply requiring that the init value of a range that can be
default-initialized be valid fixes the core problem, and then the
initialization problems that we're left with are not range-specific.
3. If a range is finite (that is, it's not statically known to be empty),
and it can be default-initialized, then its init value must be empty. If it
cannot be default-initialized, then it must define a function called
emptyRange which returns an empty range of the same type. Phobos will then
define a function called emptyRange which takes a finite range which can be
default-initialized and returns the init value of that range.
The result of this is that we have a reliable way to get an empty range from
any finite range without changing the type of the range. If we could require
that finite ranges be default-initializable, then we wouldn't need
emptyRange in the same way, but even then, such code would have to check
whether the range was infinite to see if assuming that init was empty was
correct, whereas this way, code can just call emptyRange, and it'll get a
compilation error if it's given an infinite range. So, code that needs an
empty range will get compiler checks which verify that the range is finite,
and it'll work whether the range can be default-initialized or not (though
it'll only work with CTFE if the range can be default-initialized).
Of course, this entry doesn't apply to infinite ranges at all, since whether
they can be default-initialized or not, their empty is always false. The
fact that that can be determined statically is what makes them infinite
ranges.
4. All ranges must be either dynamic arrays or structs (and not pointers to
structs either).
This allows us to have consistent requirements with regards to copying,
assignment, and initialization (e.g. it's not possible to require that the
init value be valid if classes can be ranges). Most ranges are already
dynamic arrays or structs anyway, so for _most_ range-based code, this
requirement is a complete non-issue.
However, the main problem that we then have is the ranges that actually need
to be classes or interfaces. This happens when you need to do stuff like
wrap arbitrary ranges with a single range type at runtime (which
std.range.interfaces currently provides the functionality to do). The
solution to this problem then is to provide struct wrappers. That way, a
range that needs to be a class / interface will instead be a struct wrapping
a class / interface reference. The struct can then ensure that its init
value is valid, and it can ensure that the range has the correct copying and
assignment semantics. So, the replacement for std.range.interfaces will then
provide not only classes and interfaces, but it will provide the appropriate
struct wrappers as well.
So, a function like inputRangeObject might return something like
RangeWrapper!(InputRangeObject!Foo) instead of InputRangeObject!Foo like it
would currently do. The exact details still need to be sorted out, but there
should be no technical barrier to replacing all ranges which are currently
classes or interfaces with ranges that are structs wrapping classes or
interfaces, which would allow us to make it illegal for classes or
interfaces to be ranges without losing the actual functionality that such
ranges currently provide.
5. Forward ranges will no longer have save. Rather, it will be required that
copying a forward range will result in an independent copy. Both the
original and the copy must then have the same elements in the same order,
and iterating one will not affect the other. So, in essence, copying a
forward range must be equivalent to calling save (though the actual
implementation could be more clever than that).
So, for most forward ranges, this will simply mean that they won't need a
save function that just returns the this reference, and range-based code
will not need to call save.
For the ranges that would currently require that save be called, they will
have to be implemented in a way that save is no longer required (which is
made possible by disallowing pointers and references from being ranges). The
naive implementation would therefore just have the copy constructor be
equivalent to save, whereas a more sophisticated implementation would likely
use reference counting and only duplicate the internal state when the
reference count isn't 1, and a function is called which would mutate the
state (e.g. doing the equivalent to calling save when popFront is called,
and the reference count isn't 1). Obviously, the struct wrappers that the
replacement for std.range.interfaces would provide would implement the
correct copying functionality so that save would not be required, and anyone
using a class or interface as a range wouldn't actually have to implement
that behavior either.
6. Basic input ranges will be non-copyable.
Forward ranges can be either value types or pseudo-reference types which are
value types with regards to their iteration state (that is, their actual
elements may be outside of the range type itself, but the iteration state is
copied when the range is copied). The fact that their iteration state can be
copied is what makes it so that they can be forward ranges.
On the other hand, basic input ranges can't have their iteration state
copied; otherwise, they could be forward ranges. So, currently, they are
either reference types, or they are pseudo-reference types where a copy does
not copy the full iteration state, leaving the original in an invalid state
if anything is done to the copy (e.g. front is stored directly in the
struct, but the rest of the elements are on the heap or come from somewhere
else outside the struct like a socket, meaning that front will be stale on
the original after popFront is called on the copy).
So, fundamentally, basic input ranges have different copy semantics than
forward ranges. However, we _can_ have consistent copy semantics for all
basic input ranges and consistent copy semantics for all forward ranges -
just not consistent semantics between the two. The difference in their copy
semantics is fundamentally the difference between those two kinds of ranges.
So, for forward ranges, as described in the previous entry, we fix the
problem of inconsistent copy semantics by requiring that copying them result
in independent copies, whereas we have two options for basic input ranges:
1) Either we need to require that basic input ranges be reference types
(thereby avoiding screwed up copies, because the copying is basically just
copying a pointer)
2) or we need to make them non-copyable - in which case, they can still be
moved, but because they're non-copyable, you don't get situations where
iterating through one copy partially mutates the other, since there's only
ever one copy.
In order to fix the init problem, we already need to disallow classes and
pointers, which would mean that if we required that basic input ranges be
reference types, they would have to be structs wrapping pointers or
references, and it would be difficult to statically determine the difference
between a basic input range and a forward range (since there will no longer
be a save function to do that). So, we'd be forced to give basic input
ranges a different API in order to distinguish them - which isn't all bad,
since it's not always a good idea to have basic input ranges and forward
ranges share the same code given that they have different copy semantics,
but it would be nice to still be able to share code in the cases where the
copy semantics don't matter. Also, requiring that basic input ranges be
reference types would likely require some additional indirections in some
cases, since you then can't put anything directly in the range itself
(because all of the actual state would need to be in the type pointed to by
the wrapper struct in order for it to be a reference type).
On the other hand, if we just make basic input ranges non-copyable, then the
difference between them and forward ranges is then trivial to detect with
__traits(isCopyable, ...), and you can put as much state directly in the
basic input range as you want. The downside, of course, is then that you
can't copy the range. However, it then makes the semantics of what's going
on _very_ clear. A forward range is copyable, and a basic input range is
not.
This should make the fundamental difference between a basic input range and
a forward range stark in a way that it arguably isn't with save, even though
the difference is still fundamentally the same. Forward ranges can have
their iteration state copied, whereas basic input ranges can't. But instead
of having that be determined by a special range API function that most folks
forget to call, it will instead be a clear difference in copy semantics that
will actually result in compilation errors if you get it wrong.
Of course, the major downside here is simply that code will be forced to use
explicit moves in a bunch of cases. But it's certainly better than copying a
range and then using the partially mutated (and thus invalid) original like
you can easily do now. And DIP 1040 should improve the situation
considerably by adding implicit moves for the last use of a variable.
It will also help that most ranges are forward ranges anyway, so while this
will be annoying in some code (probably most commonly in the functions in
Phobos that are written to work with both basic input ranges and forward
ranges), most code probably won't have to deal with it, and the code that
does will then be avoiding subtle copying bugs like it would potentially
have now. So, it's one of those cases where the compilation errors could get
annoying, but they're also clearly preventing bugs.
As an aside, having non-copyable basic input ranges will be beneficial for
the replacement for std.random. It's historically been a problem that
accidentally copying random number generator ranges results in numbers being
repeated, and by making them non-copyable, that won't be a problem anymore.
We can then add a separate function to them (e.g. dup) to explicitly to get
a copy from them when we actually want to.
7. Assigning a forward range to an lvalue of the same type should have the
equivalent semantics as copying a forward range, with the difference of
course being that the lvalue is then completely replaced with the new state
as opposed to generating a completely new copy.
These should be the obvious semantics for assignment, but it's not something
that we can currently rely on for all of the same reasons that we can't
currently rely on the semantics of copying a range.
8. Assigning an basic input range to an lvalue of the same type should only
work if it's a move rather than a copy, since basic input ranges are
non-copyable.
Again, these should be the obvious semantics for assignment given what the
copy semantics are.
9. In order to deal with transient front, we will explicitly disallow it. It
will be a requirement that if front is copied, the copy will be unaffected
by popFront. Of course, this doesn't prevent other references to the same
data from mutating it (e.g. if it's a reference type), but calling popFront
will not mutate it.
Therefore, code which wishes do something like std.stdio's ByLine needs to
use a different solution, e.g. it can use opApply instead, or it can make
front non-copyable, since if it's non-copyable, then popFront can mutate it
without worrying about any copies being mutated.
10. With regards to ranges of non-copyable elements, there will be a clear
way to test for whether a range has copyable elements, and range-based code
that won't work with non-copyable elements should disallow them. This may be
as simple as __traits(isCopyable, ElementType!R), or it may involve a new
trait such as hasCopyableElements!R. It's undecided yet whether an
additional trait will be necessary, but the range documentation will be
clear on the matter, and range-based Phobos code that needs to copy elements
will test for copyability.
11. Finite random-access ranges are required to implement opDollar, and
their opIndex must work with $. Similarly, any ranges which implement
slicing must implement opDollar, and slicing must work with $.
In most cases, this will just be an alias to length, but barring a language
change that automatically treats length as opDollar (which has been
discussed before but has never materialized and is somewhat controversial
given types where it wouldn't make sense to treat length as opDollar), we
have to require that opDollar be defined, or generic code won't be able to
use $ with indexing or slicing. We probably would have required it years ago
except that it would have broken code to add the requirement.
12. We're going to rely on some form of language improvement (such as
Walter's recent Implicit Conversion of Template Instantiations DIP) to solve
the tail-const problem.
As an alternate solution, we could rework the range API to do be more like a
list in a functional language - e.g. head returns the first element, and
tail returns a range containing all of the elements except for the first
one. This would allow us to define tail's return type as being tail-const,
but that could get interesting with assignment (e.g. you can't assign
Range!(const Foo) to const!(Range!Foo)), and basic iteration would require
assignment with such an API, e.g.
for(auto r = range; !r.empty; r = r.tail)
{
auto e = r.head;
}
So, we'd probably be forced to add a separate function just to get a
tail-const copy (since tail wouldn't necessarily return the exact same type
as the range, because tail would be returning a tail-const copy minus head),
e.g.
for(auto r = range.tailConstCopy; !r.empty; r = r.tail)
{
auto e = r.head;
}
And at that point, the actual solution to the tail-const issue isn't really
changing to an API that copies the range so much as providing a function
that returns a tail-const copy and routinely using that, which means that we
could fix the tail-const problem with the current API simply by adding a
function which gives you a tail-const copy. As such, switching to an API
that copies ranges to iterate through them rather than popping off their
elements in-place doesn't really buy us anything.
Regardless, adding a function to get a tail-const copy would be putting us
in pretty much the same boat that we're in now with save - only probably
worse. It would really only be needed in situations where a range has become
const somehow, meaning that most range-based code would not be tested with
such ranges, and the new function would be forgotten just like save (only it
would probably be forgotten even more). So, I'm inclined to think that a
language solution is a much better solution.
Also, with regards to a range API that copies for iteration, concerns have
been raised about its performance vs popFront. I ran some basic tests to see
if that was a problem, and it didn't seem to make any difference, but it
wouldn't surprise me if it did with more complex types (which basically
would have required duplicating a bunch of std.algorithm with a different
range API to compare the performance, which I didn't take the time to do).
So, I don't know how valid those concerns are, but I do agree that it would
be giving the optimizer more to optimize out.
Regardless, I decided that it would just be too large a change to the range
API with too little benifit, and if we can just implement tail-const
conversions in the language (which we arguably should be doing for more than
just ranges), then that solves the problem. And if we decide that we really
do need a new range API function to fix the problem, we can just add it
(though if we do, hopefully we do it before the initial release of Phobos v3
so that we can require it without breaking code).
13. The range API functions for basic input ranges and forward ranges will
be changed from front, popFront, and empty to first, popFirst, and isEmpty.
And the range API functions for bidirectional ranges will be renamed last
and popLast.
I'm not a big fan of this, since it would be nice to keep the same function
names, and making this change requires a language change for foreach.
However, it solves two problems:
First, if we remove save from forward ranges, then the old basic input
ranges look like the new forward ranges so long as they're structs and
copyable. So, if we don't change the forward range API in some fashion, then
it will be easy to pass ranges that aren't really forward ranges to code
that requires forward ranges, and bugs will ensue.
Second, if we want to make it so that dynamic arrays / slices are forward
ranges without requiring an import, then we're going to get a _lot_ of
symbol conflicts. Almost all of the range-based code currently in existence
imports either std.range or std.range.primitives, and _all_ of that code
would get symbol conflicts if we added the range API functions for arrays to
object.d. That can be managed with an Edition, but it would mean that tons
of code would have to be updated with the new Edition. On the other hand, if
we use a different set of function names for the range API functions that
need to be added to object.d for arrays, then we don't get all of those
symbol conflicts, and only new code which is written to use the new range
API will need to worry about the name changes.
Of course, if we add the functions to object.d, we arguably should still do
it with an Edition if we want to minimize the risk of symbol conflicts
(since there could be code in the wild with conflicting symbols; but if we
rename the functions, then we're talking about _possible_ symbol conflicts
rather than the guarantee of a ton of them like we'll get if we don't rename
them).
And https://issues.dlang.org/show_bug.cgi?id=24451 makes putting the current
range API functions in object.d even worse, since selective imports would
only work with std.range.primitives and not std.range, meaning that a lot of
the code that currently uses selective imports would still break, though
if/when that bug is fixed, then that problem goes away.
Of course, if we decide that it's too much trouble to put the range API
functions for arrays in object.d, we could solve the problem of the old
basic input ranges looking like the new forward ranges by renaming only a
single function - e.g. empty to isEmpty - since even a single difference in
the API would be enough to make them different.
Alternatively, we could add an enum of some kind to the new range API to
make it different. It would be kind of ugly, but it would allow us to keep
all of the current function names - though we'd still have to import a
module to use arrays as ranges, so that problem wouldn't be fixed.
Of course, as annoying as it would be to rename the range API functions, it
_would_ have the advantage that you could see at a glance that code was
using the new range API rather than the old one. And that could definitely
be valuable. It also means that if you accidentally use the range traits
from the wrong version of the API, you'll get errors _very_ quickly, since
rather than being partially compatible, they won't be compatible at all.
So, while I'm definitely not happy about renaming the range API functions,
after having considered the options, I think that it's the best path
forward.
Of course, if we rename them, that unfortunately also opens up the door for
bikeshedding, but I think that first, popFirst, isEmpty, last, and popLast
are pretty obvious choices, and they correspond well to the old names.
Either way, it's my intention that Phobos v3 will have wrappers such that
you can easily convert a range from the v2 API to a range that works with
the v3 API and vice versa. In all likelihood, the main annoyance with the
new function names will simply be getting used to them. It'll be easy enough
to update the names if code is ported from v2 to v3, and any such ranges
will need to be looked over to make sure that they match the new
requirements anyway (though most ranges probably already do fulfill most of
the added requirements in this proposal).
Conclusion
----------
So, that covers all of the major points that I can think of right now. Of
course, there will be other minor changes (e.g. I'll probably rename
isForwardRange to isCopyableRange), but those are the core changes, and this
document is already ridiculously long as it is. However, I think that it's
covered what really needs to be covered.
I do need to put together a specification which lists the range API in its
entirety rather than as a proposal for what to change and why to change it
(if nothing else, because some form of that needs to be part of the
documentation), but this document is already ridiculously long without that,
so I'm leaving it as just the list of problems and proposed changes rather
than including a full spec.
In any case, have fun picking apart and complaining about the proposal. :)
- Jonathan M Davis
More information about the Digitalmars-d
mailing list