[OT] Finding longest documents

Sergey Gromov snake.scaly at gmail.com
Sun Oct 12 17:45:44 PDT 2008


Sun, 12 Oct 2008 19:11:00 -0500,
Andrei Alexandrescu wrote:
> Walter Bright wrote:
> > Andrei Alexandrescu wrote:
> >>    nil '(("\\(!(\\)[^()]*\\()\\)"
> > 
> > I guess this is why I don't use emacs. I don't want to hear any more 
> > grousing about !( ) after that!!!
> 
> I agree about that, but why don't you use std.algorithm? :o)
> 
> Speaking of which, here's another challenge for everybody who'd like to 
> waste some cycles.
> 
> Say you have a simple API for accessing a large collection of files - 
> e.g. all of Google's cached documents. The task, should you accept it, 
> is to find the 1,000,000 largest ones of those files. The output should 
> be filenames sorted in decreasing order of size. Assume the API gives 
> you <filename, size> pairs in a serial fashion. You can't use parallel 
> processing (e.g. map/reduce on clusters), but you are allowed to use 
> threads on one machine if if fancies you. Speed is highly desirable. 
> Devise your algorithm and explain how fast it runs with practical and/or 
> theoretical arguments.

A straight-forward solution would be putting name-size pairs into a tree 
sorted by size, which is O(1e6 log 1e6). Then, if the tree contains more 
than a million elements, remove the smallest which is also O(1e6 log 
1e6).  You can optimize it by skipping elements smaller than the smallest 
element in the tree if the tree is full.

I haven't found any trees in Phobos though.

Am I missing something?



More information about the Digitalmars-d mailing list