output ranges: by ref or by value?

Fri Jan 1 13:49:45 PST 2010

Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> Jason House wrote:
>>> Andrei Alexandrescu wrote:
>>> 
>>>> Philippe Sigaud wrote:
>>>>> On Thu, Dec 31, 2009 at 16:47, Michel Fortin
>>>>> <michel.fortin at michelf.com 
>>>>> <mailto:michel.fortin at michelf.com>> wrote:
>>>>> 
>>>>> On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
>>>>> <SeeWebsiteForEmail at erdani.org 
>>>>> <mailto:SeeWebsiteForEmail at erdani.org>> said:
>>>>> 
>>>>> The question of this post is the following: should output
>>>>> ranges be passed by value or by reference? ArrayAppender uses
>>>>> an extra indirection to work properly when passed by value.
>>>>> But if we want to model built-in arrays' operator ~=, we'd
>>>>> need to request that all output ranges be passed by
>>>>> reference.
>>>>> 
>>>>> 
>>>>> I think modeling built-in arrays is the way to go as it makes
>>>>> less things to learn. In fact, it makes it easier to learn
>>>>> ranges because you can begin by learning arrays, then
>>>>> transpose this knowledge to ranges which are more abstract
>>>>> and harder to grasp.
>>>>> 
>>>>> 
>>>>> I agree. And arrays may well be the most used range anyway.
>>>> Upon more thinking, I'm leaning the other way. ~= is a quirk of
>>>> arrays motivated by practical necessity. I don't want to
>>>> propagate that quirk into ranges. The best output range is one
>>>> that works properly when passed by value.
>>> I worry about a growing level of convention with ranges.  Another
>>> recent range thread discussed the need to call consume after a
>>> successful call to startsWith.  If I violated convention and had
>>> a range class, things would fail miserably.  There would be no
>>> need to consume after a successful call to startsWith and the
>>> range would have a random number of elements removed on an
>>> unsuccessful call to startsWith. I'm pretty sure that early 
>>> discussions of ranges claimed that they could be either structs
>>> and classes, but in practice that isn't the case.
>> I am implementing right now a change in the range interface
>> mentioned in 
>> http://www.informit.com/articles/printerfriendly.aspx?p=1407357,
>> namely: add a function save() that saves the iteration state of a
>> range.
>> 
>> With save() in tow, class ranges and struct ranges can be used the
>> same way. True, if someone forgets to say
>> 
>> auto copy = r.save();
>> 
>> and instead says:
>> 
>> auto copy = r;
>> 
>> the behavior will indeed be different for class ranges and struct
>> ranges.
> 
> Or if they completely forgot that bit of convention and omit creating
> a variable called save... Also, doesn't use of save degrade
> performance for structs? Or does the inliner/optimizer remove the
> copy variable altogether?

It may be best to discuss this on an example:

/**
If $(D startsWith(r1, r2)), consume the corresponding elements off $(D
r1) and return $(D true). Otherwise, leave $(D r1) unchanged and
return $(D false).
  */
bool consume(R1, R2)(ref R1 r1, R2 r2)
         if (isForwardRange!R1 && isInputRange!R2)
{
     auto r = r1.save();
     while (!r2.empty && !r.empty && r.front == r2.front) {
         r.popFront();
         r2.popFront();
     }
     if (r2.empty) {
         r1 = r;
         return true;
     }
     return false;
}

For most structs, save() is very simple:

auto save() { return this; }

For classes, save() entails creating a new object:

auto save() { return new typeof(this)(this); }

If the implementor of consume() forgets to call save(), the situation is 
unpleasant albeit not catastrophic: for most struct ranges things will 
continue to work, but for class ranges the function will fail to perform 
to spec. I don't know how to improve on that.

Anyway, it's not entirely a convention. I'll change isForwardRange to 
require the existence of save().

Andrei