output ranges: by ref or by value?

Steven Schveighoffer schveiguy at yahoo.com
Sun Jan 3 12:55:16 PST 2010


On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Steven Schveighoffer wrote:
>
>> Not having opSlice be part of the interface itself does not preclude it  
>> from implementing opSlice, and does not preclude using ranges of it in  
>> std.algorithm.  If I'm not mistaken, all functions in std.algorithm  
>> rely on compile time interfaces.  opApply allows for full input range  
>> functionality for things like copying and outputting where templates  
>> may not be desired.
>
> The point is how much container-independent code can someone write by  
> using the Container interface. If all you have is a Container, you can't  
> use it with any range algorithm.

The answer is, you don't.  Ranges via interfaces are slower than primitive  
functions defined on the interface, so use what's best for interfaces when  
you have an interface and what's best for compile-time optimization when  
you have the full implementation.

>>>> BTW, the primitives in dcollections are:
>>>>  clear(); // clear all elements
>>>> remove(V v); // remove an element
>>>
>>> Search and remove? That's an odd primitive. Why wouldn't you offer an  
>>> interface for iteration that allows an algorithm for search, and a  
>>> primitive for positioned removal?
>>  Search and positioned removal are also a primitives, but not defined  
>> on the interface.  remove was a primitive on Tango's containers, and  
>> dcollections was originally meant to be a replacement for Tango's  
>> containers.
>>  I think the point is, if you have an interface reference, what would  
>> be the minimum functionality needed so that you could use a container  
>> without knowing its implementation.
>
> Yes, and I think remove(V v) does not belong to the minimum  
> functionality. It combines two functions (search and remove) and raises  
> questions such as what happens to duplicate elements.

the function removes one value if it is in the container, zero if it is  
not.

Another primitive interface Multi(V) adds a removeAll(V v) function.

Again, it combines two functions that are awkward or difficult to  
implement using ranges via interfaces.

>>>> contains(V v); // returns whether an element is contained in the  
>>>> colleciton
>>>
>>> I don't think this belongs to primitives. It's O(n) for many  
>>> containers and again it's a generic algorithm, not a member.
>>  Again, it's part of the minimal usable interface.  It's not a generic  
>> algorithm, because some containers can implement this more efficiently.
>
> But this is exactly what I believe to be a mistake: you are abstracting  
> away algorithmic complexity.

Not really, I say right in the docs that contains(V) can take O(n) time.   
There's no abstraction.  I don't say that the algorithmic complexity is  
defined by the implementation.  It's no different than std.algorithm's  
find.

>
>> Plus, to use the generic algorithms, you would need to use interfaces  
>> as ranges which I think are completely useless.
>
> Why?

3 virtual calls per iteration, no possibility to inline, and reference  
copy semantics.  The latter is the biggest problem for me.  I want to be  
able to save ranges (iterators) for later use on containers, and having to  
call to the heap for each save is a deal killer.

>
>>>> length(); // the length of the collection
>>>
>>> That's not a primitive either. std.algorithm.walkLength is. For me,  
>>> all sorts of red flags and alarm buzzers go off when primitives are  
>>> guaranteed that can't be implemented efficiently but by a subset of  
>>> containers. You can't discount complexity as a implementation detail.
>>  All current dcollection containers have O(1) length.
>
> Some containers can't define O(1) length conveniently.

length is actually defined to return ~0 if it cannot compute the length  
quickly :)  Forgot to mention that detail.  So such containers have an  
"out" so to say.

In fact, length is part of a different primitive I defined: Iterator(V)

Iterator(V) defines length and opApply and can be implemented by any  
class, not just containers.  My original goal when I planned dcollections  
for tango was for things like LineInput file readers and such to implement  
the Iterator interface.  That way you could do cool stuff like:

container.add(new LineInput("file.txt")); // add all the lines from the  
file

but it didn't come to fruition, so Iterator is only implemented on  
dcollections...

>>>> dup(); // duplicate the collection
>>>> opApply(int delegate(ref V v) dg); // iterate the collection
>>>> opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
>>>> collection
>>>>  That means it covers only empty in your list of must-have functions  
>>>> (via length() == 0).
>>>
>>> How do you implement length() for a singly-linked list? Is empty()  
>>> going to take O(n)?
>>  first, dcollections' list implementation is doubly linked because all  
>> collections are forward and backward iterable.
>>  second, even for singly linked lists, you can have either O(1) length  
>> or O(1) splicing (consuming a link list range into another linked  
>> list).  Dcollections' default link list implementation uses O(1) length  
>> since I think splicing is a specialized requirement.
>
> Right. The question is how much pressure Container is putting on the  
> implementation. I'd rather leave it to the list implementation to decide  
> to store the length or not.

That is possible with my design.  I realize I didn't fully disclose the  
nature of length before.  But of course, there aren't any containers in  
dcollections that *don't* define length.

However, you make it sound like there are so many containers that would  
want to not store the length.  There is only one -- linked lists, and only  
with special requirements (O(1) splicing)

I guess this all goes back to how empty is to be composed.  In  
dcollections, all containers implement O(1) length, so that is not an  
issue.  If you had a specialized linked list implementation, I'm not sure  
how you could compose it with my primitives.  But I don't consider it to  
be a worthwhile deficiency to fix.  most of the time, you care if  
something is empty because you are about to take an element, or iterate  
it.  Just call the thing you want to do, and there are ways to check if it  
didn't return anything.  In any case, empty could be defined as a  
primitive if necessary (in all dcollections it would be written return  
length == 0).

>>>> Add is not a primitive because the Map collections shouldn't assign  
>>>> some random key to the new element.  removeAny is defined only on  
>>>> sets and multisets, but I'm not sure that it couldn't be moved to  
>>>> Collection, in fact, I'll probably do that.
>>>
>>> add is a primitive that takes Tuple!(K, V) as the element type.
>>  How do you define that on Container(V)?  on Map(K, V), set(K k, V v)  
>> is an interface method.
>
> Map!(K, V) has Tuple!(K, V) as its element type.

That makes things awkward.  For example, to remove, you must remove the  
K-V pair.  typically you only want to specify the K or the V.  I realize  
it doesn't make things awkward for your container interface since you  
don't *define* a remove function, but your container interface isn't very  
useful as a general container.

>>>> Note that it's missing begin and end which are defined on every  
>>>> single container type (i.e. the equivalent of the all-elements  
>>>> range).  This is because those primitives return a struct that is  
>>>> different for every container type.
>>>
>>> So you can't write container-independent iteration code unless you use  
>>> opApply, in which case composition becomes tenuous.
>>  No, you can easily write container-independent iteration as long as  
>> you have the implementation.
>
> In this context: container-independent = using the Container interface.  
> This is the whole purpose of creating a container hierarchy. If the  
> container design fosters knowing the implementation, maybe a class  
> hierarchy is not the right choice in the first place.
>
>> If you use interfaces you can write opApply wrappers to do different  
>> things.  I'm not sure what you mean by composition.
>
> For example, compose ranges a la retro or take.

how do you compose retro on an input range?  that's all you got with your  
container interface.  It's the same for opApply.

take is simple:

class TakeIterator(V) : Iterator(V)
{
    private Iterator!V src;
    private uint nelems;
    public this(Iterator!V src, uint nelems) { this.src = src; this.nelems  
= nelems;}
    public int opApply(int delegate(ref V v) dg)
    {
       uint elems = 0;
       int result = 0;
       foreach(ref v; src)
       {
           if(elems++ > nelems || (result = dg(v)) != 0)
		break;
       }
       return result;
    }
    public uint length()
    {
       uint len = src.length();
       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ?  
len : nelems;
    }
}


>
>>>> It also surpasses opSlice via opApply, since all an input range can  
>>>> do anyways is iterate.  In fact, opApply is more powerful because you  
>>>> can change elements and remove elements (via purging).  Plus it's  
>>>> more efficient than a range-via-interface.
>>>
>>> An input range is a subset of other (more powerful) ranges. It's also  
>>> much more flexible. I agree that calling range primitives via  
>>> interfaces can become an efficiency liability.
>>  How is it more flexible?  You can't replace data, and you can't remove  
>> data while iterating, both of which are possible with dcollection's  
>> primitives.  If you have a Container(E) which defines InputRange!E  
>> opSlice, how do you get at the more defined range definition?  casting?
>
> You can replace data by assigning to range's elements. Removal is done  
> via positioned remove (which I think needs to be a primitive).

That is available in dcollections, just not part of the interface.  Every  
container implements remove(cursor) function, but since cursors are  
implementation specific, it can't be part of the interface.

>>  Not having opSlice on the interface guarantees you never have a  
>> virtual call for iteration :)  opApply mitigates the virtual call on  
>> the interface.
>
> And takes away the ability to compose ranges and to use algorithms with  
> the container.

As long as you only have the interface and not the actual container.  I  
have no problem with that.

>
>>> Above all: the primitive set for a container must be a small set of  
>>> functions that (a) can be implemented by all containers within  
>>> reasonable efficiency bounds, and (b) are container-specific, not  
>>> generic. IMHO any container design that defines a search(Element) as a  
>>> primitive is misguided. Searching is not a container primitive - it's  
>>> an algorithm. Only more specialized containers (e.g. trees, hashes  
>>> etc.) can afford to define search(Element) as a primitive. Linear  
>>> search is a generic algorithm that works the same for everyone. It  
>>> does not belong as a method of any container.
>>  If the minimal container design isn't usable without std.algorithm,  
>> then I don't think it's worth having interfaces.
>
> Why?

If I have to pull out std.algorithm to do simple things like remove a  
specific element from a container, where std.algorithm may not do the most  
efficient thing, then why the hell have an interface in the first place?   
If I have an interface, I want to make it do all the things all containers  
can do, not delegate it to some other part of the library.  I trust that  
the interface knows how to do container functions in the best way possible.

>
> I think the other way: if the minimal container design is unusable with  
> std.algorithm, the design took a wrong turn somewhere.

If the OO *interface* does not support std.algorithm, then that's good by  
me, seeing as how you have to use non-inlinable virtual functions to do  
simple tasks on ranges that cannot be copied without allocating on the  
heap.  I think absolutely containers should support std.algorithm via the  
method std.algorithm is usable by any other range -- compile-time  
interfaces.

>> If you need std.algorithm, you need the full implementation of the  
>> container because it's a compile-time interface.
>
> How did you reach that conclusion? std.algorithm uses a syntactic  
> interface that is obeyed by interfaces too. There's no problem.

It's a bastardized hack to use interfaces with std.algorithm.  I think  
it's as useful as functional qsort.  Yeah the big-O is the same, but the  
implementation sucks.

>> Interface ranges are something that should be avoided, it's like having  
>> a programming language where everything has to be a class.
>
> I disagree. The negation of an implementation dogma can be just as  
> limiting as the dogma. The way I see it, a design defines some  
> desiderata. Then it does whatever it takes to fulfill them.

You're starting to get too smart for me here :)  I have no idea what  
disrderata means.

> If one desideratum is to use a class hierachy to write  
> container-independent code, then interface ranges naturally follow.  
> There's no ifs and buts about it.

The container hierarchy can support two *different* paradigms.  First, the  
paradigm of runtime interfaces which may be useful for using containers to  
store pieces of data to go with an object.  Such as a socket cache that  
maps ip addresses to sockets.  This cache cares nothing about sorting or  
doing retro iteration on the socket cache.  It cares not about the  
implementation of the map, just that the map does map-like things (i.e.  
assign this key to this value, remove this key, etc.).

The other paradigm is to have access to std.algorithm in the most  
efficient way possible.  Such access requires full implementation  
knowledge to make the most efficient implementation of algorithms.  In  
fact, containers can *use* std.algorithm underneath their member functions  
if so desired.

>> What you are saying seems completely incorrect to me: "since not all  
>> containers can implement fast search, I'm going to guarantee that *all*  
>> containers use a linear search via their interface.
>
> This is a misunderstanding. In the STL linear containers don't define  
> find(), but associative containers do. That is the correct design.

But STL isn't accessed without the full implementation, you are comparing  
apples to oranges here.

You said that find is something that should be relegated to  
std.algorithm.  Isn't std.algorithm's find a linear search?  If I have a  
Container!E, don't I need to use std.algorithm to search for an element  
with it's returned range?  How is this not forcing a linear search when  
using an interface to a container?  What is the answer to that, don't  
search for an element in a container unless you have the full  
implementation?  That works in dcollections too!

>> *AND* I want to make each loop in the search algorithm call 3 virtual  
>> functions!"
>
> Not necessarily. This happens only if you use the Container interface to  
> write container-independent code. It is the cost it takes for the design  
> to fulfill its desiderata.

Is that not the case we are discussing?

>
>> How is that better than a search function that guarantees linear  
>> performance but gives the option of being as fast as possible with no  
>> loop virtual calls?
>
> It is better because it doesn't write off search complexity as an  
> implementation detail. "Linear or better" is not a good guarantee. A  
> good guarantee is "From this node of the hierarchy down, this primitive  
> is defined to guarantee O(log n) search". Linear search is a generic  
> algorithm and does not belong to any container.

That is available via the full implementation.  With the interface, it's  
the "best you can do", because that's all that is available.  "linear or  
better" is a better guarantee than "linear with guaranteed 3 no-inlining  
virtual function calls per element."

-Steve



More information about the Digitalmars-d mailing list