Transient ranges

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Sun May 29 21:17:00 PDT 2016


On Sunday, May 29, 2016 13:36:24 Steven Schveighoffer via Digitalmars-d wrote:
> On 5/27/16 9:48 PM, Jonathan M Davis via Digitalmars-d wrote:
> > On Friday, May 27, 2016 23:42:24 Seb via Digitalmars-d wrote:
> >> So what about the convention to explicitely declare a
> >> `.transient` enum member on a range, if the front element value
> >> can change?
> >
> > Honestly, I don't think that supporting transient ranges is worth it.
> > Every
> > single range-based function would have to either test that the "transient"
> > enum wasn't there or take transient ranges into account, and
> > realistically,
> > that isn't going to happen. For better or worse, we do have byLine in
> > std.stdio, which has a transient front, but aside from the performance
> > benefits, it's been a disaster.
>
> Wholly disagree. If we didn't cache the element, D would be a
> laughingstock of performance-minded tests.

Having byLine not copy its buffer is fine. Having it be a range is not.
Algorithms in general just do not play well with that behavior, and I don't
think that it's reasonable to expect them to.

> > It's way too error-prone. We now have
> > byLineCopy to combat that, but of course, byLine is the more obvious
> > function and thus more likely to be used (plus it's been around longer),
> > so
> > a _lot_ of code is going to end up using it, and a good chunk of that code
> > really should be using byLineCopy.
>
> There's nothing actually wrong with using byLine, and copying on demand.
> Why such a negative connotation?

Because it does not play nicely with ranges, and aside from a few rare
ranges like byLine that have to deal directly with I/O, transience isn't
even useful. Having an efficient solution that plays nicely with I/O is
definitely important, but it doesn't need to be a range, especially when it
complicates ranges in general. byLine doesn't even work with
std.array.array, and if even that doesn't work, I don't see how a range
could be considered well-behaved.

> > I'm of the opinion that if you want a transient front, you should just use
> > opApply and skip ranges entirely.
>
> So you want to make this code invalid? Why?
>
> foreach(i; map!(a => a.to!int)(stdin.byLine))
> {
>     // process each integer
>     ...
> }
>
> You want to make me copy each line to a heap-allocated string so I can
> parse it?!!

If it's a range, then it can be passed around to other algorithms with
impunity, and almost nothing is written with the idea that a range's front
is transient. There's no way to check for transience, and I don't think
that it's even vaguely worth adding yet another range primitive that has to
be checked for everywhere just for this case. Transience does _not_ play
nicely with algorithms in general.

Using opApply doesn't completely solve the problem (since the buffer could
still escape - we'd need some kind of scope attribute or wrapper to fix that
problem), but it makes it so that you can't pass such a a range around and
run into problems with all of the algorithms that don't play nicely with it.
So, instead, you end up with code that looks something like

foreach(line; stdin.byLine())
{
    auto i = line.to!int();
    ...
}

And yes, it's slightly longer, but it prevents a whole class of bugs by not
having it be a range with a transient front.

> > Allowing for front to be transient -
> > whether you can check for it or not - simply is not worth the extra
> > complications. I'd love it if we deprecated byLine's range functions, and
> > made it use opApply instead and just declare transient ranges to be
> > completely unsupported. If you want to write your code to have a transient
> > front, you can obviously take that risk, but you're on your own.
>
> There is no way to disallow front from being transient. In fact, it
> should be assumed that it is the default unless it's wholly a value-type.

Pretty much no range-based code is written with the idea that front is
transient. It's pretty much the opposite. Unfortunately, we can't check for
all of the proper range semantics at compile time (be it having to do with
transience, the fact that front needs to be the same every time until
popFront is called, that save has to actually result in a range that will
have exactly the same elements, or whatever other runtime behavior that
ranges are supposed to adhere to), but just because something can't be
checked for doesn't mean that it should be considered reasonable or valid.
IMHO, a range with a transient front should be considered as valid as a
range that returns a different value every time that front is called without
popFront having been called. Neither can be tested for, but both cause
problems.

If we're going to support transience, then we _need_ to have some sort of
flag/enum in the type to indicate that the range is transient, but that
complicates everything, because then all range implementations have to check
for it and pass it on when they wrap that type, and many algorithms will
have to expclicitly check for it in their template constraints to make it
invalid. You end up with a whole lot of extra machinery in range-based code
to support a very small number of ranges.

The number of things that range-based code has to check for is already
arguably way too high without adding yet more into the mix.

- Jonathan M Davis



More information about the Digitalmars-d mailing list