output ranges: by ref or by value?

Fri Jan 1 06:47:58 PST 2010

Philippe Sigaud wrote:
> On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin at michelf.com 
> <mailto:michel.fortin at michelf.com>> wrote:
> 
>     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
>     <SeeWebsiteForEmail at erdani.org
>     <mailto:SeeWebsiteForEmail at erdani.org>> said:
> 
>         The question of this post is the following: should output ranges
>         be passed by value or by reference? ArrayAppender uses an extra
>         indirection to work properly when passed by value. But if we
>         want to model built-in arrays' operator ~=, we'd need to request
>         that all output ranges be passed by reference.
> 
> 
>     I think modeling built-in arrays is the way to go as it makes less
>     things to learn. In fact, it makes it easier to learn ranges because
>     you can begin by learning arrays, then transpose this knowledge to
>     ranges which are more abstract and harder to grasp.
> 
> 
> I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
motivated by practical necessity. I don't want to propagate that quirk 
into ranges. The best output range is one that works properly when 
passed by value.

>     Beside, an extra indirection is wasteful when you don't need it.
>     It's easier to add a new layer of indirection when you need one than
>     the reverse, so the primitive shouldn't require any indirection.
> 
> 
> So (squint through sleep-deprived eyes:) that makes it by ref, right?
>  
> 
> 
>         // pseudo-method
>         void put(R, E)(ref R tgt, E e) {
>            tgt.front = e;
>            tgt.popFront();
>         }

It doesn't. The ref in there is to pass tgt to the pseudo-method put, 
not to the function that invokes it.

> A few random comments, sorry if they are not entirely coherent:
> 
> - this new put needs hasAssignableElements!R, right? What's in this case 
> the difference between isOutputRange!R and hasAssignableElements!R?

It's a good question. There are two possible designs:

1. In the current design, the difference is that hasAssignableElements!R 
does not imply the range may grow. Consider this:

auto a = new int[10], b = new int[10];
copy(a, b);

This should work. But this shouldn't:

auto a = new int[10], b = new int[5];
copy(a, b);

because copy does not grow the target. If you want to append to b, you 
write:

copy(a, appender(&b));

2. In the design sketched in 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357, 
iteration is separated from access. In that approach, you'd have a 
one-pass range for both input and output.

I'm not sure which design is better. (Suggestions are welcome.) For a 
pure output range, it's awkward to define empty() (what should it 
return?) and it's also awkward to put elements by calling two functions 
front/popFront instead of one.

> - should all higher-order ranges expose a put method if possible? 
> (stride comes to mind, but also take or filter).

I don't think so. The generic put will take care of that.

> - does that play nice with the new auto ref / ref template parameter 
> from 2.038? It seems to me that this new feature will go hand in hand 
> with this, but I may be mistaken.

There's no obvious connection. The nice thing about auto ref is this:

struct SomeAdapterRange(R) if (isWhateverRange!R) {
    private R src;
    @property auto ref front() {
       return src.front;
    }
}

You don't want to see how that looks today. Actually:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d

Search the page for "mixin" :o}.

> - your shim method will be used like this:
> 
> put(r,e);
> 
> whereas for an output range you use:
> 
> r.put(e);
> 
> and you cannot freely go from one form to the other, except for arrays, 
> which are output ranges anyway [*]. Does that mean that you must 
> disseminate static if ByRef/Assignable/Output/Whatever checks 
> everywhere, to use either put(r,e) or r.put(e)?

The D language automatically rewrites the latter into the former.

> - what if R is a range of ranges (ie: if E is itself a range). Should 
> put by invoked recursively? What if its a chunked range?

I don't know.

> - something I wanted to ask for a long time: does put really write to 
> the range as written in the docs or to the underlying container for 
> which the output range is but a 'writable' view? The container/range 
> separation does not exist for arrays, but is important for other cases.

Depends on how the range is defined. Appender holds a pointer to an 
array and appends to it. But appender is a special-purpose range. A 
usual range cannot change the topology of the container it's under.

Andrei