[OT] The horizon of a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sat Nov 1 11:19:54 PDT 2008


Christopher Wright wrote:
> Andrei Alexandrescu wrote:
>> Consider some text stream. We define the horizon of the stream as the 
>> longest number of lines between two identical lines minus one, or zero 
>> if the stream consists of unique lines.
>>
>> For example, this stream has horizon 9:
>>
>> lorem
>> ipsum
>> dolor
>> habeas
>> corpus
>> ipsum
>> sit
>> amet
>> rigor
>> mortis
>> consectetur
>> coitus
>> interruptus
>> corpus
>> adipisicing
>> elit
>>
>> This is because the word "corpus" occurs on line 5 and line 14. This 
>> is the farthest two lines in the file are; notice that "ipsum" also 
>> occurs twice, but at a smaller distance.
>>
>> The challenge, should you blah blah, is to compute the horizon of a 
>> potentially large file. Of course theoretical and practical speed are 
>> as important as always.
>>
>> This is a task that's interesting and subtle in quite a number of 
>> ways, and it allows for many creative solutions. Therefore I 
>> intentionally did not restrict it much. Let ideas flow!
>>
>>
>> Andrei
> 
> Simplicity considering external memory constraints.
> Read in the file, line by line. Output the line followed by the line 
> number. Sort the output (merge sort works wonderfully). Now you go 
> through it a second time and find the maximum distance.
> 
> This takes O(n log n) time, which is far from optimal. However, I think 
> we already did sorting in external memory, so I could implement this 
> much quicker than most alternatives. Depending on the number of times 
> this has to be done, that might be a net advantage.

I like this solution a lot. Kudos!

Andrei



More information about the Digitalmars-d mailing list