Issue with forward ranges which are reference types
Jonathan M Davis
jmdavisProg at gmx.com
Tue Aug 16 21:05:54 PDT 2011
Sorry that this is long, but it's very important IMHO, and I don't know how to
make it much shorter and cover what it's supposed to cover.
Okay. Your typical forward range is either an array a struct which is a value
type (that is, copying it creates an independent range which points to the
same elements and is not altered if the original range is altered - the
elements that it points to aren't copied of course). So, when you want to get
a copy, it's as easy as
auto rangeCopy = range;
It was previously determined that this would be a problem for ranges which are
reference types (classes in particular, but it affects structs as well, if
copying them doesn't create an independent range). So, we added the save
property.
auto rangeCopy = range.save;
That way, when we need to save the state of a range, we always use save, even
if that particular range would be copied by a simple assignement. Okay. So far
so good. There is one major problem with the current situation though: the
behavior of value-type and reference-type forward ranges is very different when
they are passed to range-based functions.
We use save within a particular algorithm when we know that we need to copy a
range's state, but simply passing a range to a function may or may not copy a
range. Take this possible implementation of a drop function, for instance:
R drop(R)(R range, size_t n)
if(isInputRange!R)
{
popFrontN(range, n);
return range;
}
It pops n elements off of the range and returns the range sans those elements.
In the case of an input range, the original range is n elements shorter - as
expected. In the case of forward ranges though, it varies. If you pass an
array or a struct with value semantics to drop, then the original range is not
altered. Just passing it to drop is equivalent to calling save. However, if
you use a range which is a reference type, then just like with an input range,
the original range is altered. So, the behavior of code could change
drastically depending on whether it's given a range which is a value type or a
range which is a reference type - even if they're both forward ranges.
auto valRange = "hello world";
assert(equal(drop(valRange, 5), " world");
assert(equal(valRange, "hello world");
auto refRange = createRefRange("hello world");
assert(equal(drop(refRange, 5), " world");
assert(equal(refRange, " world"));
So, the question is, should a range-based function have the same behavior for
all forward ranges regardless of whether they're value types or reference
types? Or should the caller be aware of whether a range is a value type or a
reference type and call save if necessary? Or should the caller just always
call save when passing a forward range to a function?
Option 1: If we make it so that all functions behave identically for all
forward ranges, then we're going to need something like this at the beginning
of every range-based function for all ranges passed to that function:
static if(isForwardRange!R) range = range.save;
This is definitely a bit tedious, but it's completely doable. And with some
proper tests for it, it's easy to catch whether a function actually does this
like it's supposed to. However, while in most cases, the compiler should be
able to optimize out this assignment for structs, it can't always do it. IIUC,
if the struct defines a postblit constructor or if any of its member variables
define a postblit constructor (or if any of their member variables declare a
postblite construcotr, or any of their member variables' member variables...),
then the save call and assignment won't be able to be optimized out, and there
will be a performance cost. However, it's likely that very few range types
will have postblit constructors, given the usual simplicity of their design
and the fact if they actually need a postblit constructor, they can probably
just forgoe it in favor of using the save property for all copying, making
them reference types.
Option 2: On the other hand, we could make it so that the caller just has to
be aware of whether a range is value type or a reference type and always call
save when passing a reference type forward range to a function. This avoids
the potential performance penalty but is very error-prone. Instead of the
range-based function worrying about it, now _every_ function that calls a
range-based function must worry about it, and that's quite likely to be error-
prone. It's also likely to be somewhat problematic in generic code (which
range-based code frequently is), since you can't exactly test for whether a
range is a reference type or not, forcing you to pretty much call save all of
the time when passing ranges to functions in generic code.
Option 2 is essentially what we've been doing in Phobos except that we don't
actually test reference type ranges at all, and the odds are that a lot of
Phobos doesn't actually handle reference type ranges correctly - meaning that
even if the caller remembers to call save before passing such a range to a
Phobos range-based function, there's a good chance that the function won't
work correctly.
Option 3: Or, we could just say that you should _always_ call save when
passing a forward range to a function. That way, you avoid the issue of trying
to figure out whether a range is a reference type or not. It's also less error-
prone in that the fact that you're always doing it makes you less likely to
forget to do it when you need to. However, you have the irritation of having
to check for input ranges (since you can't call save on them) and essentially
end up doing the exact same thing as option 1 except that you're doing it at
every call point instead of once inside of the function definition. So, you
really haven't gained much over putting it inside of the function and made it
more error-prone than option 1since you have to remember to do it everywhere
that you call a function instead of just the once in its definition.
So, the question is which option is the better one? Or is there another option
that I haven't thought of? It's quite clear to me that we're going to need to
add unit tests to Phobos to verify the behavior of Phobos functions when
dealing with ranges which are reference types, but I think that we need a
clear strategy on how to deal with the fact that value-type forward ranges are
automatically saved when they're passed to a function whereas reference-type
forward ranges are not.
Thoughts?
- Jonathan M Davis
More information about the Digitalmars-d
mailing list