Tricky semantics of ranges & potentially numerous Phobos bugs

Tue Oct 16 13:08:05 PDT 2012

On Tue, Oct 16, 2012 at 09:21:11PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
> > Perhaps mark ranges with an .isTransient property (which can be an
> > enum so it doesn't waste any space), and range-based functions that
> > require non-transient ranges will refuse to work with a transient
> > range. Or alternatively, switch to a different implementation that
> > takes transience into account (which may be slower, etc., but the
> > important thing is to do it right -- after all, D's motto is safe by
> > default, unsafe if you ask for it).
> 
> Not a bad idea, though it's still arguably a bit disgusting to
> potentially have to check for that all over the place. Inevitably,
> most functions won't check, and ByLine _still_ won't work. Yes, Phobos
> would presumably end up checking in most cases, but I suspect that
> little else will. We arguably have way to many things to check about
> ranges as it is. So, I'd be far more tempted to just change ByLine to
> use opApply rather than adding the extra complication of isTransient
> to the standard library just for this one use case.

But like I said, we shouldn't single out ByLine just because it's what I
happened to run into this time. This is part of a larger issue, which is
that currently, we *assume* that ranges will not invalidate the value of
.front when popFront() is called. But many kinds of ranges don't fulfill
this (currently unstated) requirement. ByLine is only one of the more
common examples.

Another example that I've mentioned is a range which returns the
permutations of an array by swapping array elements in-place.

Yet another example, that actually came up in my own code, is an
expression tree structure that represents potentially multiple values
(it contains operators like ±). There is a method that returns a range
of all possible values the tree can take on. Individual values are in
fact vectors, which means .front returns real[]. It is much more
efficient to implement, for example, ± just by flipping the signs of
each component of the vector in-place, than to .dup every single time.
If one is searching for a specific vector, for example, it would be
wasteful to .dup every real[] that gets generated, only to immediately
discard it.

So this is a non-trivial example of a range that modifies the array
returned by .front when you call popFront(), but is otherwise a
perfectly valid range.

And I argue that these "transient ranges" as I call them do have their
place. One can iterate over them, call find() or canFind() on them,
etc., all the normal things one does with ranges, which are what makes
the concept of ranges so powerful. The only thing is that one has to be
careful about saving the value of .front and then calling popFront().

Marking these ranges with some kind of marker, like isTransient, will
help solve this problem. It doesn't have to be done everywhere: just for
the few ranges that exhibit this property. And it only needs to be
checked by functions that require .front to not mutate under them when
they call popFront(). Most functions in Phobos, I suspect, don't even
need to bother with it. So it should be a relatively small change. Seems
to be a win-win situation to me.

> > I think this approach of what amounts to guessing which ranges are
> > safe/unsafe with which functions is what's untenable. We all know
> > that people don't read documentation, or at least, not thoroughly.
> > Something like this is too easy to miss, and bugs will slip into
> > code unnoticed for a long time, and then explode in your face. It's
> > unsafe by default, which goes against D's philosophy.
> 
> The problem is that what ByLine is doing is incredibly abnormal. So,
> we're talking about affecting how all ranges do things just to satisfy
> one, incredibly abnormal use case. It's just that this one range is
> heavily used, making the fact that it doesn't work normally a problem.

I don't agree that what ByLine does is abnormal. If this were indeed the
case, it would greatly limit the scope of ranges and range-based
functions. You wouldn't be able to implement ranges that reuse buffers
(as I described above), and you may end up being forced to use
inefficient implementations just to satisfy the few functions that
assume that .front doesn't mutate when you call popFront(). I don't see
that as an advantageous position to take. Many range-based functions
*do* work with ByLine and other such ranges, so why impose arbitrary
restrictions on them?

Better to just treat the special cases specially (transient ranges +
functions that assume .front can be safely copied), and let the general
case work as it is already working.

[...]
> So, really, I'm in favor of just making ByLine use opApply and make it
> so that it's not a range. That doesn't completely fix the problem, but
> it reduces its scope and doesn't complicate the rest of Phobos in the
> process.
[...]

This is the simplest way out, I suppose, but it does limit the scope of
range-based functions. Which I think is a loss -- what makes the concept
of ranges so powerful is their sheer generality. I'd hate to lose that
just because a few Phobos functions decide that they can't live with a
transient .front.

T

-- 
BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL