[OT] Finding longest documents

KennyTM~ kennytm at gmail.com
Sun Oct 12 23:59:37 PDT 2008


Andrei Alexandrescu wrote:
> KennyTM~ wrote:
>> Andrei Alexandrescu wrote:
>>> Walter Bright wrote:
>>>> Andrei Alexandrescu wrote:
>>>>>    nil '(("\\(!(\\)[^()]*\\()\\)"
>>>>
>>>> I guess this is why I don't use emacs. I don't want to hear any more 
>>>> grousing about !( ) after that!!!
>>>
>>> I agree about that, but why don't you use std.algorithm? :o)
>>>
>>> Speaking of which, here's another challenge for everybody who'd like 
>>> to waste some cycles.
>>>
>>> Say you have a simple API for accessing a large collection of files - 
>>> e.g. all of Google's cached documents. The task, should you accept 
>>> it, is to find the 1,000,000 largest ones of those files. The output 
>>> should be filenames sorted in decreasing order of size. Assume the 
>>> API gives you <filename, size> pairs in a serial fashion. You can't 
>>> use parallel processing (e.g. map/reduce on clusters), but you are 
>>> allowed to use threads on one machine if if fancies you. Speed is 
>>> highly desirable. Devise your algorithm and explain how fast it runs 
>>> with practical and/or theoretical arguments.
>>>
>>>
>>> Andrei
>>
>> Straightforward solution: Use insertion sort.
>>
>> auto list = new PriorityQueue!(Documents);
> 
> One point taken off for not using the new syntax 
> PriorityQueue!Documents. Ahem. On a serious note, you should specify the 
> unusual comparison predicate:
> 

Will use it when it's announced :)

> auto list = new PriorityQueue!(Documents, "a > b");
> 
> as I supposed the default version would use less-than. Or does it?
> 

I don't know :). I haven't written a priority queue in D (only stacks 
and queues) :)

>> // Documents.opCmp returns difference of two documents' length.
>> // the SHORTEST document gets priority.
>>
>> foreach (doc; google_cache) {     // O(N)
>>   if (list.length >= 1_000_000)
>>     list.pop();                   // O(log 1M)
>>   list.push(doc);                 // O(log 1M)
>> }
>>
>> auto array = convertToArray(list); // O(1M log 1M)
>> array.reverse();                   // O(1M)
>>
>> write(array);                      // O(1M).
>>
>> Total time complexity: O(N log 1M).
> 
> Great! I'm glad there's so much interest and competency around here.
> 
> 
> Andrei



More information about the Digitalmars-d mailing list