[OT] Finding longest documents

KennyTM~ kennytm at gmail.com
Sun Oct 12 23:57:22 PDT 2008


KennyTM~ wrote:
> Andrei Alexandrescu wrote:
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>>    nil '(("\\(!(\\)[^()]*\\()\\)"
>>>
>>> I guess this is why I don't use emacs. I don't want to hear any more 
>>> grousing about !( ) after that!!!
>>
>> I agree about that, but why don't you use std.algorithm? :o)
>>
>> Speaking of which, here's another challenge for everybody who'd like 
>> to waste some cycles.
>>
>> Say you have a simple API for accessing a large collection of files - 
>> e.g. all of Google's cached documents. The task, should you accept it, 
>> is to find the 1,000,000 largest ones of those files. The output 
>> should be filenames sorted in decreasing order of size. Assume the API 
>> gives you <filename, size> pairs in a serial fashion. You can't use 
>> parallel processing (e.g. map/reduce on clusters), but you are allowed 
>> to use threads on one machine if if fancies you. Speed is highly 
>> desirable. Devise your algorithm and explain how fast it runs with 
>> practical and/or theoretical arguments.
>>
>>
>> Andrei
> 
> Straightforward solution: Use insertion sort.
> 
> auto list = new PriorityQueue!(Documents);
> 
> // Documents.opCmp returns difference of two documents' length.
> // the SHORTEST document gets priority.
> 
> foreach (doc; google_cache) {     // O(N)
>   if (list.length >= 1_000_000)
>     list.pop();                   // O(log 1M)
>   list.push(doc);                 // O(log 1M)
> }
> 
> auto array = convertToArray(list); // O(1M log 1M)
> array.reverse();                   // O(1M)
> 
> write(array);                      // O(1M).
> 
> Total time complexity: O(N log 1M).

Sorry, wrong solution. Forgot to compare before pop.

auto list = new PriorityQueue!(Documents, "a < b");

// Documents.opCmp returns difference of two documents' length.
// the SHORTEST document gets priority.

foreach (doc; google_cache) {     // O(N)
   if (list.length >= 1_000_000 && list.top() < doc)
     list.pop();                   // O(log 1M)
   list.push(doc);                 // O(log 1M)
}

auto array = convertToArray(list); // O(1M log 1M)
array.reverse();                   // O(1M)

write(array);                      // O(1M).

Total time complexity: O(N log 1M).



More information about the Digitalmars-d mailing list