Issue with forward ranges which are reference types

Tue Aug 16 21:05:54 PDT 2011

Sorry that this is long, but it's very important IMHO, and I don't know how to 
make it much shorter and cover what it's supposed to cover. 

Okay. Your typical forward range is either an array a struct which is a value 
type (that is, copying it creates an independent range which points to the 
same elements and is not altered if the original range is altered - the 
elements that it points to aren't copied of course). So, when you want to get 
a copy, it's as easy as

auto rangeCopy = range;

It was previously determined that this would be a problem for ranges which are 
reference types (classes in particular, but it affects structs as well, if 
copying them doesn't create an independent range). So, we added the save 
property.

auto rangeCopy = range.save;

That way, when we need to save the state of a range, we always use save, even 
if that particular range would be copied by a simple assignement. Okay. So far 
so good. There is one major problem with the current situation though: the 
behavior of value-type and reference-type forward ranges is very different when 
they are passed to range-based functions.

We use save within a particular algorithm when we know that we need to copy a 
range's state, but simply passing a range to a function may or may not copy a 
range. Take this possible implementation of a drop function, for instance:

R drop(R)(R range, size_t n)
    if(isInputRange!R)
{
    popFrontN(range, n);
    return range;
}

It pops n elements off of the range and returns the range sans those elements. 
In the case of an input range, the original range is n elements shorter - as 
expected. In the case of forward ranges though, it varies. If you pass an 
array or a struct with value semantics to drop, then the original range is not 
altered. Just passing it to drop is equivalent to calling save. However, if 
you use a range which is a reference type, then just like with an input range, 
the original range is altered. So, the behavior of code could change 
drastically depending on whether it's given a range which is a value type or a 
range which is a reference type - even if they're both forward ranges.

auto valRange = "hello world";
assert(equal(drop(valRange, 5), " world");
assert(equal(valRange, "hello world");

auto refRange = createRefRange("hello world");
assert(equal(drop(refRange, 5), " world");
assert(equal(refRange, " world"));

So, the question is, should a range-based function have the same behavior for 
all forward ranges regardless of whether they're value types or reference 
types? Or should the caller be aware of whether a range is a value type or a 
reference type and call save if necessary? Or should the caller just always 
call save when passing a forward range to a function?

Option 1: If we make it so that all functions behave identically for all 
forward ranges, then we're going to need something like this at the beginning 
of every range-based function for all ranges passed to that function:

static if(isForwardRange!R) range = range.save;

This is definitely a bit tedious, but it's completely doable. And with some 
proper tests for it, it's easy to catch whether a function actually does this 
like it's supposed to. However, while in most cases, the compiler should be 
able to optimize out this assignment for structs, it can't always do it. IIUC, 
if the struct defines a postblit constructor or if any of its member variables 
define a postblit constructor (or if any of their member variables declare a 
postblite construcotr, or any of their member variables' member variables...), 
then the save call and assignment won't be able to be optimized out, and there 
will be a performance cost. However, it's likely that very few range types 
will have postblit constructors, given the usual simplicity of their design 
and the fact if they actually need a postblit constructor, they can probably 
just forgoe it in favor of using the save property for all copying, making 
them reference types.

Option 2: On the other hand, we could make it so that the caller just has to 
be aware of whether a range is value type or a reference type and always call 
save when passing a reference type forward range to a function. This avoids 
the potential performance penalty but is very error-prone. Instead of the 
range-based function worrying about it, now _every_ function that calls a 
range-based function must worry about it, and that's quite likely to be error-
prone. It's also likely to be somewhat problematic in generic code (which 
range-based code frequently is), since you can't exactly test for whether a 
range is a reference type or not, forcing you to pretty much call save all of 
the time when passing ranges to functions in generic code.

Option 2 is essentially what we've been doing in Phobos except that we don't 
actually test reference type ranges at all, and the odds are that a lot of 
Phobos doesn't actually handle reference type ranges correctly - meaning that 
even if the caller remembers to call save before passing such a range to a 
Phobos range-based function, there's a good chance that the function won't 
work correctly.

Option 3: Or, we could just say that you should _always_ call save when 
passing a forward range to a function. That way, you avoid the issue of trying 
to figure out whether a range is a reference type or not. It's also less error-
prone in that the fact that you're always doing it makes you less likely to 
forget to do it when you need to. However, you have the irritation of having 
to check for input ranges (since you can't call save on them) and essentially 
end up doing the exact same thing as option 1 except that you're doing it at 
every call point instead of once inside of the function definition. So, you 
really haven't gained much over putting it inside of the function and made it 
more error-prone than option 1since you have to remember to do it everywhere 
that you call a function instead of just the once in its definition.

So, the question is which option is the better one? Or is there another option 
that I haven't thought of? It's quite clear to me that we're going to need to 
add unit tests to Phobos to verify the behavior of Phobos functions when 
dealing with ranges which are reference types, but I think that we need a 
clear strategy on how to deal with the fact that value-type forward ranges are 
automatically saved when they're passed to a function whereas reference-type 
forward ranges are not.

Thoughts?

- Jonathan M Davis