output ranges: by ref or by value?

Steven Schveighoffer schveiguy at yahoo.com
Sun Jan 3 05:28:06 PST 2010


On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Steven Schveighoffer wrote:
>> My theory is, given this list of ranges, if you pair them with an  
>> algorithm that requires save capability, you wouldn't want to use that  
>> algorithm on it anyways (kinda like the consume example).
>
> Why gratuitously limit the design? You're asking to replace this:
>
> R save() { return this; }
>
> with:
>
> enum thisIsAForwardRange = true;
>
> Is there a reason? The former leaves in flexibility. The latter doesn't,  
> for no good reason.

Well, one thing you could do is:

enum thisIsAnInputRange = true;

and then no special implementation is needed for normal forward ranges.   
the other point is there is no special treatment needed inside algorithms  
-- the risk of forgetting to use save at the right points of the algorithm  
is higher than forgetting to say isForwardRange!(R) at the beginning of  
the function.

>
>>> Struct ranges won't work with a container hierarchy. If you define a  
>>> container hierarchy (classes etc.) you'll also need a range hierarchy.  
>>> Otherwise defining the inheritance relationship is impossible.  
>>> Consider:
>>>
>>> abstract class Container(E) { // most general container
>>>      @property bool empty();
>>>      bool add(E element);
>>>      E removeAny();
>>>      InputRange!E opSlice();
>>> }
>>>
>>> That's what I'd expect any container worth its salt to implement: (a)  
>>> test for empty; (b) add an element to the container; (c) remove some  
>>> element from the container; (d) get a range that spans the entire  
>>> container. (Probably removeAll() and a range-positioned remove() would  
>>> make sense, too.)
>>  The range interface is compile-time only, there is no need to define  
>> it in the container interface.  opSlice does not need to be part of the  
>> interface, even if it is defined on all the container types.
>>  opApply makes for a much better interface-friendly iteration anyways.
>
> I want container users to be able to save iteration state and also to  
> open up containers to std.algorithm and other goodies, so I'm shying  
> away from opApply.

Not having opSlice be part of the interface itself does not preclude it  
 from implementing opSlice, and does not preclude using ranges of it in  
std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely  
on compile time interfaces.  opApply allows for full input range  
functionality for things like copying and outputting where templates may  
not be desired.

>
>> BTW, the primitives in dcollections are:
>>  clear(); // clear all elements
>> remove(V v); // remove an element
>
> Search and remove? That's an odd primitive. Why wouldn't you offer an  
> interface for iteration that allows an algorithm for search, and a  
> primitive for positioned removal?

Search and positioned removal are also a primitives, but not defined on  
the interface.  remove was a primitive on Tango's containers, and  
dcollections was originally meant to be a replacement for Tango's  
containers.

I think the point is, if you have an interface reference, what would be  
the minimum functionality needed so that you could use a container without  
knowing its implementation.

>
>> contains(V v); // returns whether an element is contained in the  
>> colleciton
>
> I don't think this belongs to primitives. It's O(n) for many containers  
> and again it's a generic algorithm, not a member.

Again, it's part of the minimal usable interface.  It's not a generic  
algorithm, because some containers can implement this more efficiently.   
Plus, to use the generic algorithms, you would need to use interfaces as  
ranges which I think are completely useless.

>
>> length(); // the length of the collection
>
> That's not a primitive either. std.algorithm.walkLength is. For me, all  
> sorts of red flags and alarm buzzers go off when primitives are  
> guaranteed that can't be implemented efficiently but by a subset of  
> containers. You can't discount complexity as a implementation detail.

All current dcollection containers have O(1) length.

>
>> dup(); // duplicate the collection
>> opApply(int delegate(ref V v) dg); // iterate the collection
>> opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
>> collection
>>  That means it covers only empty in your list of must-have functions  
>> (via length() == 0).
>
> How do you implement length() for a singly-linked list? Is empty() going  
> to take O(n)?

first, dcollections' list implementation is doubly linked because all  
collections are forward and backward iterable.

second, even for singly linked lists, you can have either O(1) length or  
O(1) splicing (consuming a link list range into another linked list).   
Dcollections' default link list implementation uses O(1) length since I  
think splicing is a specialized requirement.

>
>> Add is not a primitive because the Map collections shouldn't assign  
>> some random key to the new element.  removeAny is defined only on sets  
>> and multisets, but I'm not sure that it couldn't be moved to  
>> Collection, in fact, I'll probably do that.
>
> add is a primitive that takes Tuple!(K, V) as the element type.

How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is an  
interface method.

what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)),  
but then trying to use the container functions are very cumbersome.  In  
dcollections, Map(K, V) inherits Collection(V).

>
>> Note that it's missing begin and end which are defined on every single  
>> container type (i.e. the equivalent of the all-elements range).  This  
>> is because those primitives return a struct that is different for every  
>> container type.
>
> So you can't write container-independent iteration code unless you use  
> opApply, in which case composition becomes tenuous.

No, you can easily write container-independent iteration as long as you  
have the implementation.

If you use interfaces you can write opApply wrappers to do different  
things.  I'm not sure what you mean by composition.

>
>> It also surpasses opSlice via opApply, since all an input range can do  
>> anyways is iterate.  In fact, opApply is more powerful because you can  
>> change elements and remove elements (via purging).  Plus it's more  
>> efficient than a range-via-interface.
>
> An input range is a subset of other (more powerful) ranges. It's also  
> much more flexible. I agree that calling range primitives via interfaces  
> can become an efficiency liability.

How is it more flexible?  You can't replace data, and you can't remove  
data while iterating, both of which are possible with dcollection's  
primitives.  If you have a Container(E) which defines InputRange!E  
opSlice, how do you get at the more defined range definition?  casting?

>
>>>> I see a range as being useful for iteration or algorithms, but not  
>>>> for general use.  A great example is AAs.  Would you say that an AA  
>>>> *is* a range or should *provide* a range?  If it is a range, does  
>>>> that mean you remove elements as you call popFront?  Does that make  
>>>> any sense?  If it doesn't, then what happens if you add elements  
>>>> through another alias to that AA?
>>>
>>> An AA provides several ranges - among which byKey and byValue.
>>  I misunderstood your statement "[a container hierarchy] does need  
>> range interfaces."  I thought you meant containers themselves need to  
>> implement the range interface, I see now that isn't the case, so my bad.
>
> Yah, they'd offer it as a result of opSlice(). Covariant return types  
> will ensure there's no virtual call when you know what container you  
> operate on.

Not having opSlice on the interface guarantees you never have a virtual  
call for iteration :)  opApply mitigates the virtual call on the interface.

> Above all: the primitive set for a container must be a small set of  
> functions that (a) can be implemented by all containers within  
> reasonable efficiency bounds, and (b) are container-specific, not  
> generic. IMHO any container design that defines a search(Element) as a  
> primitive is misguided. Searching is not a container primitive - it's an  
> algorithm. Only more specialized containers (e.g. trees, hashes etc.)  
> can afford to define search(Element) as a primitive. Linear search is a  
> generic algorithm that works the same for everyone. It does not belong  
> as a method of any container.

If the minimal container design isn't usable without std.algorithm, then I  
don't think it's worth having interfaces.  If you need std.algorithm, you  
need the full implementation of the container because it's a compile-time  
interface.  Interface ranges are something that should be avoided, it's  
like having a programming language where everything has to be a class.

What you are saying seems completely incorrect to me: "since not all  
containers can implement fast search, I'm going to guarantee that *all*  
containers use a linear search via their interface.  *AND* I want to make  
each loop in the search algorithm call 3 virtual functions!"  How is that  
better than a search function that guarantees linear performance but gives  
the option of being as fast as possible with no loop virtual calls?  In  
fact, my implementation is guaranteed to be faster than std.algorithm  
because of the lack of virtual calls.

-Steve



More information about the Digitalmars-d mailing list