std.container & ranges

Wed Nov 2 16:24:03 PDT 2011

On 11/2/2011 2:41 PM, Steven Schveighoffer wrote:
> On Wed, 02 Nov 2011 09:17:39 -0400, Ary Manzana <ary at esperanto.org.ar>
> wrote:
>
>> On 11/2/11 10:12 AM, Steven Schveighoffer wrote:
>>> On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary at esperanto.org.ar>
>>> wrote:
>>>
>>>> On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
>>>>>
>>>>> The basic response to this is, when dealing with containers
>>>>> generically
>>>>> (that is, you know you have a container, but you don't know what
>>>>> type),
>>>>> the "remove this element" operation is not necessarily a good
>>>>> primitive
>>>>> to have.
>>>>>
>>>>> Simply because from the myriad of containers, only some can implement
>>>>> this operation efficiently. Java embeds this operation in the
>>>>> interface,
>>>>> which means any interface you have to a container could potentially
>>>>> use
>>>>> O(n) time to remove that element. Such an innocuous piece of syntax
>>>>> *should* have a cost if it's not efficient IMO.
>>>>>
>>>>> BTW, the original question doesn't provide enough information to say
>>>>> "remove this element." Even in Java, if you aren't using the default
>>>>> comparison, you must use a comparator method to determine which one to
>>>>> remove. If cell.x == x && cell.y == y *is* the comparison operator for
>>>>> the type, then the syntax gets much simpler, because you don't need to
>>>>> pass a specialized comparison function.
>>>>>
>>>>> In dcollections, removing a specific element (using the default
>>>>> comparison operator for that element) on a *fast lookup* container
>>>>> is as
>>>>> simple as:
>>>>>
>>>>> container.remove(container.find(x));
>>>>>
>>>>> Which removes the element x if it's found. However, this is not
>>>>> defined
>>>>> for containers which use O(n) time to search (such as linked list),
>>>>> you
>>>>> must use std.algorithm.find for that:
>>>>>
>>>>> container.remove(find(container[], x).begin);
>>>>>
>>>>> Should work, and takes O(n) time.
>>>>>
>>>>> -Steve
>>>>
>>>> I don't really understand what's wrong with inefficient methods. You
>>>> can have inefficient methods that are convenient, like removing an
>>>> element by the default comparison, or giving it a delegate to match
>>>> the element(s) to remove.
>>>>
>>>> You profile your application. Is that method the bottle-neck? If so,
>>>> you change it to a more efficient one. If not, you are happy you had
>>>> that method there, performing in an inefficient way, but which doesn't
>>>> matter that much compared to, say, opening an SQL connection.
>>>>
>>>> Programmers want to program, fast. They have schedules, they need to
>>>> deliver. They don't need to always find the best solution. They can
>>>> find a compromise between "working" and "fast", move on, and later
>>>> profile and worry about what matters most.
>>>>
>>>> Programmers don't want to fight with the language or think "Oh, so to
>>>> remove an element I need to use this operation and combine it with
>>>> that one and with that other one"...
>>>
>>> Or use the right container for the job?
>>>
>>> Where it really comes into play is generic programming.
>>>
>>> Let's say I write some algorithm that removes certain elements from a
>>> container:
>>>
>>> removeElements(C, T)(C c, T t[]...)
>>> {
>>> foreach(x; t)
>>> c.remove(t);
>>> }
>>>
>>> What's the complexity of this algorithm? For a HashSet, for instance, it
>>> will be O(n) where n is the number of elements to remove.
>>>
>>> But for an ArrayList, it will be O(n*m) where m is the number of
>>> elements in c.
>>
>> But I'm sure in this algorithm I have for this app I'm making, my
>> collection won't have more than 50 elements. So everything will be
>> O(1). I need to remove an element from the collection. I really don't
>> care about the complexity of the operation, because if n is 50,
>> everything is O(1).
>
> Then your specific application can use std.algorithm.find to implement
> the equivalent. Only the algorithm body changes, not the usage of the
> algorithm.
>
>>
>> Why can't I have an inefficient (for you) remove operation that, for
>> me, will be ok?
>
> I never said you couldn't (and I've even given examples of such
> implementations). It's just not neatly packaged into a method.
>
> But again, if the method is exactly the same as the efficient version
> for other containers, it becomes *impossible* to design an algorithm
> that guarantees any sort of complexity. As I said before, quadratic sort
> is epic fail, and needs to always be avoided.
>
> I'll give you a scenario:
>
> People often complain that Linked List does not have an opIndex on it.
> Yes it's inefficient, but so what? "I know it's inefficient, let me
> decide whether it's worth it or not."
>
> So let's say I add it to LinkList. Those people are happy.
>
> But now, LinkList becomes defined as a *random-access-range* according
> to std.ranges. Therefore, std.algorithm.sort(linklist) compiles! And is
> now something like O(n^3).
>
> Whereas LinkList already defines a sort method, which uses mergesort
> (O(nlgn)). So are you going to realize this when reading someones code
> and you see:
>
> sort(somelist);
>
> That it's going to be horribly inefficient? Why shouldn't we strive for
> a library where such things just don't compile?
>
>> I don't want to spend 2 hours browsing the ranges algorithms to figure
>> out how to combine them to remove an element...
>
> You severely exaggerate the amount of time needed to browse the
> algorithms. I wrote an equivalent in less than a minute. And I don't
> even look at std.algorithm maybe once every few months (and didn't look
> at it to write the solution).
>
>> My point is, as someone else said it in another post: add inefficient
>> operations and tell the programmer so. Then he can decide what to do.
>> If he's sure that that performance penalty is not a big issue for him,
>> he'll be happy to have that method available.
>
> The operation is available using std.algorithm. There is no need to
> reimplement it as a method, sanctioned by the container. As I said
> before, if you find yourself wanting to remove specific elements from a
> container, perhaps Array is not the smartest choice for a container.
> There are plenty of others.
>
> There is something to be said for a language/library that discourages or
> prevents you from writing dumb code. It means that I'm that much more
> confident a piece of code written in that language is more efficient
> than one written in another language.
>
> -Steve

Hello.

You generally arguing this is all nice and good, but this is a very 
specific case.

I am using a LinkList because in my code, the elements are iterated over 
a million times and during this, I add stuff in-between elements all the 
time. However, I will be removing stuff *very* rarely. I am thus using 
the appropriate container and it doesn't matter whether the remove will 
be inefficient.

To put it another way: if removing elements was of crucial importance to 
the performance of my code in the first place, I wouldn't (and 
shouldn't) be using a LinkList. Therefore, implementing an inefficient 
method which does this won't be of consequence. Finally, the fundamental 
statement I'm trying to make here is: adding and removing *single* 
elements should be a straightforward method call for *any* container.

As a side note, your example about generic programming is really neat 
and makes sense; personally, I would never want a linked list with 
indexes and it's also a horrible analogy to the complaint at hand.

/Max