std.container & ranges

Wed Nov 2 06:41:18 PDT 2011

On Wed, 02 Nov 2011 09:17:39 -0400, Ary Manzana <ary at esperanto.org.ar>  
wrote:

> On 11/2/11 10:12 AM, Steven Schveighoffer wrote:
>> On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary at esperanto.org.ar>
>> wrote:
>>
>>> On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
>>>>
>>>> The basic response to this is, when dealing with containers  
>>>> generically
>>>> (that is, you know you have a container, but you don't know what  
>>>> type),
>>>> the "remove this element" operation is not necessarily a good  
>>>> primitive
>>>> to have.
>>>>
>>>> Simply because from the myriad of containers, only some can implement
>>>> this operation efficiently. Java embeds this operation in the  
>>>> interface,
>>>> which means any interface you have to a container could potentially  
>>>> use
>>>> O(n) time to remove that element. Such an innocuous piece of syntax
>>>> *should* have a cost if it's not efficient IMO.
>>>>
>>>> BTW, the original question doesn't provide enough information to say
>>>> "remove this element." Even in Java, if you aren't using the default
>>>> comparison, you must use a comparator method to determine which one to
>>>> remove. If cell.x == x && cell.y == y *is* the comparison operator for
>>>> the type, then the syntax gets much simpler, because you don't need to
>>>> pass a specialized comparison function.
>>>>
>>>> In dcollections, removing a specific element (using the default
>>>> comparison operator for that element) on a *fast lookup* container is  
>>>> as
>>>> simple as:
>>>>
>>>> container.remove(container.find(x));
>>>>
>>>> Which removes the element x if it's found. However, this is not  
>>>> defined
>>>> for containers which use O(n) time to search (such as linked list),  
>>>> you
>>>> must use std.algorithm.find for that:
>>>>
>>>> container.remove(find(container[], x).begin);
>>>>
>>>> Should work, and takes O(n) time.
>>>>
>>>> -Steve
>>>
>>> I don't really understand what's wrong with inefficient methods. You
>>> can have inefficient methods that are convenient, like removing an
>>> element by the default comparison, or giving it a delegate to match
>>> the element(s) to remove.
>>>
>>> You profile your application. Is that method the bottle-neck? If so,
>>> you change it to a more efficient one. If not, you are happy you had
>>> that method there, performing in an inefficient way, but which doesn't
>>> matter that much compared to, say, opening an SQL connection.
>>>
>>> Programmers want to program, fast. They have schedules, they need to
>>> deliver. They don't need to always find the best solution. They can
>>> find a compromise between "working" and "fast", move on, and later
>>> profile and worry about what matters most.
>>>
>>> Programmers don't want to fight with the language or think "Oh, so to
>>> remove an element I need to use this operation and combine it with
>>> that one and with that other one"...
>>
>> Or use the right container for the job?
>>
>> Where it really comes into play is generic programming.
>>
>> Let's say I write some algorithm that removes certain elements from a
>> container:
>>
>> removeElements(C, T)(C c, T t[]...)
>> {
>> foreach(x; t)
>> c.remove(t);
>> }
>>
>> What's the complexity of this algorithm? For a HashSet, for instance, it
>> will be O(n) where n is the number of elements to remove.
>>
>> But for an ArrayList, it will be O(n*m) where m is the number of
>> elements in c.
>
> But I'm sure in this algorithm I have for this app I'm making, my  
> collection won't have more than 50 elements. So everything will be O(1).  
> I need to remove an element from the collection. I really don't care  
> about the complexity of the operation, because if n is 50, everything is  
> O(1).

Then your specific application can use std.algorithm.find to implement the  
equivalent.  Only the algorithm body changes, not the usage of the  
algorithm.

>
> Why can't I have an inefficient (for you) remove operation that, for me,  
> will be ok?

I never said you couldn't (and I've even given examples of such  
implementations).  It's just not neatly packaged into a method.

But again, if the method is exactly the same as the efficient version for  
other containers, it becomes *impossible* to design an algorithm that  
guarantees any sort of complexity.  As I said before, quadratic sort is  
epic fail, and needs to always be avoided.

I'll give you a scenario:

People often complain that Linked List does not have an opIndex on it.   
Yes it's inefficient, but so what?  "I know it's inefficient, let me  
decide whether it's worth it or not."

So let's say I add it to LinkList.  Those people are happy.

But now, LinkList becomes defined as a *random-access-range* according to  
std.ranges.  Therefore, std.algorithm.sort(linklist) compiles!  And is now  
something like O(n^3).

Whereas LinkList already defines a sort method, which uses mergesort  
(O(nlgn)).  So are you going to realize this when reading someones code  
and you see:

sort(somelist);

That it's going to be horribly inefficient?  Why shouldn't we strive for a  
library where such things just don't compile?

> I don't want to spend 2 hours browsing the ranges algorithms to figure  
> out how to combine them to remove an element...

You severely exaggerate the amount of time needed to browse the  
algorithms.  I wrote an equivalent in less than a minute.  And I don't  
even look at std.algorithm maybe once every few months (and didn't look at  
it to write the solution).

> My point is, as someone else said it in another post: add inefficient  
> operations and tell the programmer so. Then he can decide what to do. If  
> he's sure that that performance penalty is not a big issue for him,  
> he'll be happy to have that method available.

The operation is available using std.algorithm.  There is no need to  
reimplement it as a method, sanctioned by the container.  As I said  
before, if you find yourself wanting to remove specific elements from a  
container, perhaps Array is not the smartest choice for a container.   
There are plenty of others.

There is something to be said for a language/library that discourages or  
prevents you from writing dumb code.  It means that I'm that much more  
confident a piece of code written in that language is more efficient than  
one written in another language.

-Steve