"in" everywhere

Thu Oct 7 15:53:13 PDT 2010

On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:
> Andrei Alexandrescu wrote:
> > Complexity composes very badly. Issues tend to manifest at moderate
> > sizes and may make things unbearable at large sizes. I'm really grateful
> > I'm at a workplace where the exclamation "Damn! I was waiting like an
> > idiot for this quadratic append!" is met with understanding nods from
> > workmates who've made the same mistake before.
> > 
> > As an example, only last week I was working on cramming a sort of an
> > index of the entire Wikipedia on one machine. I was waiting for the
> > indexer which ran slower and slower and slower. In the end I figured
> > there was only _one_ quadratic operation - appending to a vector<size_t>
> > that held document lengths. That wasn't even the bulk of the data and it
> > was the last thing I looked at! Yet it made the run time impossible to
> > endure.
> 
> But that is not a matter of library interface isn't it? It's a matter of
> algorithm/container choice. It's not the push_back that was slow in the
> end, it was std::vector (yes, that's arguable, but the point is that
> container defines rules for its methods, not vice-versa).

Except that when you're dealing with generic code which has to deal with 
multiple container types (like std.algorithm), you _need_ certain complexity 
guarantees about an operation since it could happen on any container that it's 
given. Using operator in in an algorithm could be perfectly reasonable if it had 
O(1) or O(log n) complexity but be completely unreasonable if it had O(n) 
complexity. So, the algorithm then is reasonable with some containers and not 
others when if in were restricted to O(1) or O(log n), it could be used by the 
algorithm without worrying that a container would be used with it which would 
make it an order of complexity greater, or worse.

If it were strictly a matter of writing code which directly used a particular 
algorithm and container type, then you could know what the complexities of the 
algorithm and operations on the container are, but once you're dealing with 
generic code, operations need to have complexity guarantees regardless of their 
container, or the complexity of the algorithm will vary with the container type. 
And if that algorithm is used in yet another algorithm, then it gets that much 
worse. It can quickly become easy to bury the place where what you're trying to 
do becomes an order of complexity greater, because the container that you 
selected was an order of complexity greater than other containers on an 
operation that an algorithm is using buried in code somewhere.

- Jonathan M Davis