[OT] Finding longest documents

Sun Oct 12 17:33:03 PDT 2008

Andrei Alexandrescu Wrote:

> Walter Bright wrote:
> > Andrei Alexandrescu wrote:
> >>    nil '(("\\(!(\\)[^()]*\\()\\)"
> > 
> > I guess this is why I don't use emacs. I don't want to hear any more 
> > grousing about !( ) after that!!!
> 
> I agree about that, but why don't you use std.algorithm? :o)
> 
> Speaking of which, here's another challenge for everybody who'd like to 
> waste some cycles.
> 
> Say you have a simple API for accessing a large collection of files - 
> e.g. all of Google's cached documents. The task, should you accept it, 
> is to find the 1,000,000 largest ones of those files. The output should 
> be filenames sorted in decreasing order of size. Assume the API gives 
> you <filename, size> pairs in a serial fashion. You can't use parallel 
> processing (e.g. map/reduce on clusters), but you are allowed to use 
> threads on one machine if if fancies you. Speed is highly desirable. 
> Devise your algorithm and explain how fast it runs with practical and/or 
> theoretical arguments.
> 
> 
> Andrei

implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded)
insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).