output ranges: by ref or by value?

Sun Jan 3 14:19:52 PST 2010

Steven Schveighoffer wrote:
> On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>
>>> Not having opSlice be part of the interface itself does not preclude 
>>> it from implementing opSlice, and does not preclude using ranges of 
>>> it in std.algorithm.  If I'm not mistaken, all functions in 
>>> std.algorithm rely on compile time interfaces.  opApply allows for 
>>> full input range functionality for things like copying and outputting 
>>> where templates may not be desired.
>>
>> The point is how much container-independent code can someone write by 
>> using the Container interface. If all you have is a Container, you 
>> can't use it with any range algorithm.
> 
> The answer is, you don't.  Ranges via interfaces are slower than 
> primitive functions defined on the interface, so use what's best for 
> interfaces when you have an interface and what's best for compile-time 
> optimization when you have the full implementation.

I understand we don't agree on this. To me, making containers work with 
algorithms is a drop-dead requirement so I will rest my case.

Nevertheless, I think there's one point that got missed. It's a tad 
subtle, and I find it pretty cool because it's the first time I used 
covariant returns for something interesting. Consider:

interface InputIter(T) { ... }
interface ForwardIter(T) : InputIter!T { ... }

class Container(T) {
     InputIter!T opSlice() { ... }
}

class Array(T) : Container(T) {
     class Iterator : ForwardIter!T {
	... all final functions ...
     }
     Iterator opSlice();
}

Now there are two use cases:

a) The user uses Array specifically.

auto a = new Array!int;
sort(a[]);

In this case, sort gets a range of type Array!(int).Iterator, which 
defines final primitives. Therefore, the compiler does not emit ONE 
virtual call throughout. I believe this point was lost.

b) The user wants generality and OOP-style so uses Container throughout:

Container!int a = new Array!int;
auto found = find(a[], 5);

This time find gets an InputIter!int, so it will use virtual calls for 
iteration.

The beauty of this design is that without any code duplication it 
clearly spans the spectrum between static knowledge and dynamic 
flexibility - within one design and one implementation. This is the 
design I want to pursue.

>> Yes, and I think remove(V v) does not belong to the minimum 
>> functionality. It combines two functions (search and remove) and 
>> raises questions such as what happens to duplicate elements.
> 
> the function removes one value if it is in the container, zero if it is 
> not.

So you need a way to do searching and a way to do positioned remove. Why 
combine them into one, when you can use both separately to great effect?

> Another primitive interface Multi(V) adds a removeAll(V v) function.

Why do you need a separate interface for removeAll? Are there containers 
that can remove one value but not all?

> Again, it combines two functions that are awkward or difficult to 
> implement using ranges via interfaces.

I contend that point.

>>>>> contains(V v); // returns whether an element is contained in the 
>>>>> colleciton
>>>>
>>>> I don't think this belongs to primitives. It's O(n) for many 
>>>> containers and again it's a generic algorithm, not a member.
>>>  Again, it's part of the minimal usable interface.  It's not a 
>>> generic algorithm, because some containers can implement this more 
>>> efficiently.
>>
>> But this is exactly what I believe to be a mistake: you are 
>> abstracting away algorithmic complexity.
> 
> Not really, I say right in the docs that contains(V) can take O(n) 
> time.  There's no abstraction.  I don't say that the algorithmic 
> complexity is defined by the implementation.  It's no different than 
> std.algorithm's find.
> 
>>
>>> Plus, to use the generic algorithms, you would need to use interfaces 
>>> as ranges which I think are completely useless.
>>
>> Why?
> 
> 3 virtual calls per iteration, no possibility to inline, and reference 
> copy semantics.  The latter is the biggest problem for me.  I want to be 
> able to save ranges (iterators) for later use on containers, and having 
> to call to the heap for each save is a deal killer.

Per my argument above, there may be zero virtual calls per iteration. (I 
want to make sure my point about covariance was understood even if it is 
not considered compelling.)

> However, you make it sound like there are so many containers that would 
> want to not store the length.  There is only one -- linked lists, and 
> only with special requirements (O(1) splicing)

I agree it's tenuous to leave length() out. It's a judgment call.

>> Map!(K, V) has Tuple!(K, V) as its element type.
> 
> That makes things awkward.  For example, to remove, you must remove the 
> K-V pair.  typically you only want to specify the K or the V.  I realize 
> it doesn't make things awkward for your container interface since you 
> don't *define* a remove function, but your container interface isn't 
> very useful as a general container.

It is. What is true is a converse of your statement - a general 
container isn't very useful as a map - which I'd agree with, and is 
sensible OO design. As long as a map is seen as a container, it _does_ 
store K-V pairs. If I know it's a map, great, it defines a different 
primitive for removing an element given its key.

>>> If you use interfaces you can write opApply wrappers to do different 
>>> things.  I'm not sure what you mean by composition.
>>
>> For example, compose ranges a la retro or take.
> 
> how do you compose retro on an input range?  that's all you got with 
> your container interface.  It's the same for opApply.

You need to be a bit down in the hierarchy to use retro.

> take is simple:
> 
> class TakeIterator(V) : Iterator(V)
> {
>    private Iterator!V src;
>    private uint nelems;
>    public this(Iterator!V src, uint nelems) { this.src = src; 
> this.nelems = nelems;}
>    public int opApply(int delegate(ref V v) dg)
>    {
>       uint elems = 0;
>       int result = 0;
>       foreach(ref v; src)
>       {
>           if(elems++ > nelems || (result = dg(v)) != 0)
>         break;
>       }
>       return result;
>    }
>    public uint length()
>    {
>       uint len = src.length();
>       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems 
> ? len : nelems;
>    }
> }

And then how do I compose take with something else, or pass it to 
std.algorithm.find()? This whole thing doesn't hold water.

>> You can replace data by assigning to range's elements. Removal is done 
>> via positioned remove (which I think needs to be a primitive).
> 
> That is available in dcollections, just not part of the interface.  
> Every container implements remove(cursor) function, but since cursors 
> are implementation specific, it can't be part of the interface.

Well that's a problem with the design, no? Why then define a hierarchy? 
A simple set of unrelated types may be a better design.

> If I have to pull out std.algorithm to do simple things like remove a 
> specific element from a container, where std.algorithm may not do the 
> most efficient thing, then why the hell have an interface in the first 
> place?

Things can be arranged such that std.algorithm does do the best thing, 
and is also the most general and reusable approach. The STL has partly 
shown that. I plan to take that one step further - to show that 
container-independent code can be gainfully written with a hierarchy of 
containers (something that STL failed at).

> If I have an interface, I want to make it do all the things all 
> containers can do, not delegate it to some other part of the library.  I 
> trust that the interface knows how to do container functions in the best 
> way possible.

The interface must be expressive enough to let generic algorithms do 
their job.

>> I think the other way: if the minimal container design is unusable 
>> with std.algorithm, the design took a wrong turn somewhere.
> 
> If the OO *interface* does not support std.algorithm, then that's good 
> by me, seeing as how you have to use non-inlinable virtual functions to 
> do simple tasks on ranges that cannot be copied without allocating on 
> the heap.

Did my covariance argument above convince you otherwise?

> I think absolutely containers should support std.algorithm 
> via the method std.algorithm is usable by any other range -- 
> compile-time interfaces.

I agree :o).

>> I disagree. The negation of an implementation dogma can be just as 
>> limiting as the dogma. The way I see it, a design defines some 
>> desiderata. Then it does whatever it takes to fulfill them.
> 
> You're starting to get too smart for me here :)  I have no idea what 
> disrderata means.

Come on, the Internet is there. I meant "essential goal".

>> If one desideratum is to use a class hierachy to write 
>> container-independent code, then interface ranges naturally follow. 
>> There's no ifs and buts about it.
> 
> The container hierarchy can support two *different* paradigms.  First, 
> the paradigm of runtime interfaces which may be useful for using 
> containers to store pieces of data to go with an object.  Such as a 
> socket cache that maps ip addresses to sockets.  This cache cares 
> nothing about sorting or doing retro iteration on the socket cache.  It 
> cares not about the implementation of the map, just that the map does 
> map-like things (i.e. assign this key to this value, remove this key, 
> etc.).
> 
> The other paradigm is to have access to std.algorithm in the most 
> efficient way possible.  Such access requires full implementation 
> knowledge to make the most efficient implementation of algorithms.  In 
> fact, containers can *use* std.algorithm underneath their member 
> functions if so desired.

Yah. Covariance takes care of all of the above.

I'd almost agree with you as of a couple of months ago, when the whole 
covariance thing hadn't occurred to me.

>>> What you are saying seems completely incorrect to me: "since not all 
>>> containers can implement fast search, I'm going to guarantee that 
>>> *all* containers use a linear search via their interface.
>>
>> This is a misunderstanding. In the STL linear containers don't define 
>> find(), but associative containers do. That is the correct design.
> 
> But STL isn't accessed without the full implementation, you are 
> comparing apples to oranges here.
> 
> You said that find is something that should be relegated to 
> std.algorithm.

YES. If a design must express define linear search in more than one 
place, then that design is suboptimal. This is an important goal I want 
to follow.

> Isn't std.algorithm's find a linear search?

Yes.

> If I have a 
> Container!E, don't I need to use std.algorithm to search for an element 
> with it's returned range?

No. It's like in the STL - either the container defines its own 
searching primitives (e.g. set or map), or you can always grab the 
container's range and use the well-known linear search defined by 
std.algorithm.

> How is this not forcing a linear search when 
> using an interface to a container?

It isn't forcing a linear search. It is fostering awareness of the 
capabilities needed by the compiler.

A design that defines a best-effort find() as a member is suboptimal. 
There's no guarantee it can provide. A putative user cannot tell 
anything about the complexity of their code that uses the find() member. 
A good design has the user say, hey, linear search is ok here; in this 
other place, I need a fast find so I can't use Container!T, I need 
AssociativeContainer!T.

> What is the answer to that, don't 
> search for an element in a container unless you have the full 
> implementation?  That works in dcollections too!

The shape and characteristic of the search is defined by the type of the 
container.

A great library that got mentioned here in the past:

http://www.gobosoft.com/eiffel/gobo/structure/container.html

I strongly recommend studying that design; it is very close to 
optimality. They define DS_SEARCHABLE as a subclass of DS_CONTAINER - as 
they should.

>>> *AND* I want to make each loop in the search algorithm call 3 virtual 
>>> functions!"
>>
>> Not necessarily. This happens only if you use the Container interface 
>> to write container-independent code. It is the cost it takes for the 
>> design to fulfill its desiderata.
> 
> Is that not the case we are discussing?

No, there are two cases as I discussed above in this post.

>>> How is that better than a search function that guarantees linear 
>>> performance but gives the option of being as fast as possible with no 
>>> loop virtual calls?
>>
>> It is better because it doesn't write off search complexity as an 
>> implementation detail. "Linear or better" is not a good guarantee. A 
>> good guarantee is "From this node of the hierarchy down, this 
>> primitive is defined to guarantee O(log n) search". Linear search is a 
>> generic algorithm and does not belong to any container.
> 
> That is available via the full implementation.  With the interface, it's 
> the "best you can do", because that's all that is available.  "linear or 
> better" is a better guarantee than "linear with guaranteed 3 no-inlining 
> virtual function calls per element."

I think one issue might be that you think "interface" and 
"implementation" with nothing in between. I'm talking "gradually more 
specialized interface", as the Gobo library above defines.

Andrei