[OT] Finding longest documents

Sun Oct 12 18:49:58 PDT 2008

Anon wrote:
> Andrei Alexandrescu Wrote:
> 
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>>    nil '(("\\(!(\\)[^()]*\\()\\)"
>>> I guess this is why I don't use emacs. I don't want to hear any more 
>>> grousing about !( ) after that!!!
>> I agree about that, but why don't you use std.algorithm? :o)
>>
>> Speaking of which, here's another challenge for everybody who'd like to 
>> waste some cycles.
>>
>> Say you have a simple API for accessing a large collection of files - 
>> e.g. all of Google's cached documents. The task, should you accept it, 
>> is to find the 1,000,000 largest ones of those files. The output should 
>> be filenames sorted in decreasing order of size. Assume the API gives 
>> you <filename, size> pairs in a serial fashion. You can't use parallel 
>> processing (e.g. map/reduce on clusters), but you are allowed to use 
>> threads on one machine if if fancies you. Speed is highly desirable. 
>> Devise your algorithm and explain how fast it runs with practical and/or 
>> theoretical arguments.
>>
>>
>> Andrei
>  
> implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded)
> insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).

A winner but gets penalty points for not admitting being superdan :o).

Andrei