Ranges again
John Colvin via Digitalmars-d
digitalmars-d at puremagic.com
Wed Apr 16 10:11:32 PDT 2014
Construction and Initialisation
===============================
As previously discussed, it's annoying having the conflation of
construction and initialisation. It leads to two options:
1) construction is initialisation. This is nasty for some ranges
where the range is truly destructive, amongst a whole load of
other problems.
2) lazy initialisation. This is annoying to implement (hello
bugs) and potentially slow.
How about a convention to overcome this:
A new, optional, range primitive named initialize or prime is
introduced that does the initialisation.
Constructors are not guaranteed to initialise a range if an
initialize/prime member is present.
factory functions (e.g. std.range.iota for std.range.Iota) call
the constructor, followed by initialise/prime if present, unless
explicitly documented otherwise.
All entities that accept a range expect that range to be
pre-initialised unless explicitly documented otherwise.
What we do:
Document it. Publicise it.
Change all phobos ranges that are implemented as private
aggregates (struct/class) with public factory functions to the
new convention. Note that many will not need any change at all as
they have no nasty initialisation needs. This is a non-breaking
change.
Consider changing some phobos ranges with public aggregates,
where the improvement is sufficient to justify the potential
breakage for people using the raw constructors.
Nothing else. All current code with it's various initialisation
conventions will function exactly as before, any breakage is
strictly controlled on a case-by-case basis for each individual
range type.
Open Questions:
Does save always return an initialised range?
Iteration Protocol
==================
Proposal:
A) The !empty -> front (as many times as you like*) -> popFront
-> repeat sequence is used for all ranges.
B) empty and front must appear to do what they say they do** when
used according to the above, but may internally do whatever they
want. One exception: Consecutive calls to front are *not*
guaranteed to return the same value, as this doesn't play well
with random access ranges, in particular std.algorithm.map.***
C) as a consequence of B: empty and front are both independently
optional. Of course it is still an error to call front or
popFront on an empty range.
D) Ranges are not guaranteed to be internally buffered (e.g. see
caveat about front in B), but many will be for performance and/or
correctness (see WRT streams below). Inevitably some sort of
caching wrapper and a byEfficientChunk primitive are likely to
emerge, whether standardised or not.
* not deliberately by design, more just as an accidental
side-effect of B.
** non-destructive, does not advance the range, does what it says
on the tin.
*** Also, a range might work with data that is being mutated
while the range is iterating/being indexed/sliced.
WRT streams:
empty for streams requires (attempting to) read from the stream,
which is in turn destructive. Thus, implementations of streams as
ranges *must* be buffered, to preserve the illusion of empty
being a simple non-destructive check. My understanding is that
they will always be buffered on some level anyway for performance.
General notes on everything above
=================================
Ranges are not the solution to every possible iteration problem.
They are not a panacea. I believe that attempts to overly
generalise them will cause more harm than good. With some
sensible conventions we can have a really great tool, but there
will always be some situations where a different approach will be
more performant, intuitive or safer. Yes, that means you
sometimes can't use std.algorithm and std.range, which is sad,
but you can't have everything.
Those who want to really squeeze every last bit of performance
out of ranges or make use of non-conforming range-like objects
can use UDAs to tag various properties of their types, and then
have their generic algorithms special-case on those attributes
for performance and/or correctness. This enables the concept of
ranges to be extended without overly straining the core mechanics.
There might even be a place for some standardised UDAs in
std.range and some special-casing in std.algorithm/range.
However we proceed, we need an "I don't satisfy this range
requirement even though it looks like I do" indicator. I suggest
using a UDA, introduced as standard (in std.range, not the
language). E.g. @(std.range.notRandomAccessRange) myRange { ... }
would fail isRandomAccessRange even if it contains the necessary
primitives.
P.S.
Most of the parts of this have probably been proposed by other
people before, which I've probably read and forgotten about. No
originality claimed here :)
More information about the Digitalmars-d
mailing list