Range Redesign: Empty Ranges

Wed Mar 6 20:03:35 UTC 2024

On Wednesday, March 6, 2024 10:32:17 AM MST H. S. Teoh via Digitalmars-d 
wrote:
> On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via
> Digitalmars-d wrote: [...]
>
> > The only tricky aspect is ranges that are references
> > (classes/pointers).  Neither of those to me should be supported IMO,
> > you can always wrap such a thing in a range harness.
>
> [...]
>
> Every time this topic comes up, class-based ranges become the whipping
> boy of range design woes.  Actually, they serve an extremely important
> role: type erasure, which is critical when you have code like this:
>
>   auto myRangeFunc(R,S)(R range1, S range2) {
>       if (runtimeDecision()) {
>           return range1;
>       } else {
>           return range2;
>       }
>   }
>
> This generally will not compile because R and S are different types,
> even if both conform to a common underlying range API, like an input
> range. To remedy this situation, class-based range wrappers come to the
> rescue:
>
>   auto myRangeFunc(R,S)(R range1, S range2) {
>       if (runtimeDecision()) {
>           return inputRangeObject(range1);
>       } else {
>           return inputRangeObject(range2);
>       }
>   }
>
> Note that you can't use a struct wrapper here, because R and S have
> different ABIs; the only way to correctly forward range methods to R or
> S is to use overridden base class methods. IOW, the type erasure of R
> and S is unavoidable for this code to work.
>
> //
>
> Also, class-based ranges are sometimes necessary for practical reasons,
> like in this one program I had, that has an UFCS chain consisting of
> hundreds of components (not all written out explicitly in the same
> function, of course, but that's what the underlying instantiation looks
> like).  It generated ridiculously large symbols that crashed the
> compiler. Eventually the bug I filed prodded Rainer to implement symbol
> compression in DMD. But even then, the symbols were still ridiculously
> huge -- because the outermost template instantiation had to encode the
> types of every one of the hundreds of components.  As a result, symbol
> compression or not, compile times were still ridiculously slow because
> the compiler still had to copy those symbols around -- their length grew
> quadratically with every component added to the chain.
>
> Inserting a call to .inputRangeObject in the middle of the chain
> dramatically cut down the size of the resulting symbols, because it
> effectively erased all the types preceding it, resulting in much saner
> codegen afterwards.

My current plan is to retain std.range.interfaces in some form or another so
that we can have ranges as classes as far as their implementation goes (at
my job, we actually have to use classes for ranges, because we have to be
able to wrap arbitrary ranges at runtime, since we're dealing with an
intepreter). However, they would not count as ranges by any range-based
algorithms, because they would be classes, and so they would have to be
wrapped in structs to actually be used as ranges. The wrapper structs would
then do all of the work to ensure that the proper copying behavior was
implemented.

So, you'd do something like

    auto r = inputRangeObject(range1);

and get a struct range wrapping a class from the replacement for
std.range.interfaces. And if we need the ability to get at the underlying
class for something, it's as simple as providing a member to the struct
which returns it - e.g. source, like we do with several other range types.
Then you could do something like

auto r = inputRangeObject(range1);
InputRange cls = r.source;

Of course, you lose access to the class as soon as it's wrapped in another
range type, but that happens right now when you pass ranges around which are
classes. So, that shouldn't really change much from what I can see. It's
basically just wrapping classes in structs so that the struct does the
behavior that we currently do with save (and it could do it more
intelligently by taking advantage of reference counting rather than dup-ing
the class whenever the range is copied).

So, unless I'm missing something here, we _should_ be able to just wrap
classes in structs to allow ranges which are implemented as classes even if
the classes themselves can't be ranges any longer. The net result should be
that dealing with forward ranges is simpler, and we stop having save-related
bugs, but we shouldn't actually lose any functionality.

- Jonathan M Davis