Marketing of D - article topic ideas?

retard re at tard.com.invalid
Fri Jun 11 04:23:59 PDT 2010


Wed, 09 Jun 2010 09:37:51 -0500, Andrei Alexandrescu wrote:

> On 06/09/2010 01:28 AM, retard wrote:
>> Max of n unordered elements can be solved in O(log log n) time assuming
>> you have enough cores and constant time memory access. Happy now?
> 
> When calculating the complexity of an operation you don't consider cores
> in unlimited supply; it's a constant. Complexity being calculated in
> terms of the number of inputs, max of n elements is O(n) because you
> need to look at each element at least once.
> 
> Cores being a constant k, you can then consider that max can be
> distributed such that each core looks at n/k elements. What's left is k
> intermediate results, which can be further processed in log(k) time (see
> below); since we consider k a constant, this approach could claim a net
> k-fold speedup.

With this logic, all parallel algorithms have the same time complexity as 
their sequential versions.

> If available cores are proportional to n (an unusual assumption), you
> first compare pairs of the n elements with n/2 cores, then n/4
> comparisons against pairs of the remaining elements etc. Each step
> halves the number of candidates so the complexity is O(log n). I'd be
> curious to find out how that can be reduced to O(log log n).

First, divide the problem into log(log(n)) blocks and solve them 
sequentially, the complexity is O(log log n), because the problem size is 
log(log(n)) and the sequential algorithm works in O(n). Then, use a 
divide and conquer algorithm for the n/log(log(n)) maximums obtained from 
the blocks. With the correct algorithm this stage is also O(log log n), 
thus the overall complexity is O(log log n). And yes, it assumes you have 
n/log(log(n)) cores.

> Finally, for small values of n, you could consider the number of cores
> sufficiently large, but, as was pointed out, using threads for max is
> actually impractical, so superscalar execution may be a better venue.
> Practically this means exploiting ILP by reducing data dependencies
> between intermediate results. I suggest you take a look at a thread
> entitled "challenge: implement the max function" that I started on
> 01/21/2007. That thread discusses ILP issues, and continues with the
> thread "challenge #2: implement the varargs_reduce metafunction".

I believe this is true in practice. I only mentioned the time complexity 
of the parallel version, I didn't make any claims about its real world 
performance. If the problem is small enough, I'd use Python instead.


More information about the Digitalmars-d mailing list