[OT] The horizon of a stream

Christopher Wright dhasenan at gmail.com
Sat Nov 1 10:54:33 PDT 2008


Andrei Alexandrescu wrote:
> Consider some text stream. We define the horizon of the stream as the 
> longest number of lines between two identical lines minus one, or zero 
> if the stream consists of unique lines.
> 
> For example, this stream has horizon 9:
> 
> lorem
> ipsum
> dolor
> habeas
> corpus
> ipsum
> sit
> amet
> rigor
> mortis
> consectetur
> coitus
> interruptus
> corpus
> adipisicing
> elit
> 
> This is because the word "corpus" occurs on line 5 and line 14. This is 
> the farthest two lines in the file are; notice that "ipsum" also occurs 
> twice, but at a smaller distance.
> 
> The challenge, should you blah blah, is to compute the horizon of a 
> potentially large file. Of course theoretical and practical speed are as 
> important as always.
> 
> This is a task that's interesting and subtle in quite a number of ways, 
> and it allows for many creative solutions. Therefore I intentionally did 
> not restrict it much. Let ideas flow!
> 
> 
> Andrei

Simplicity considering external memory constraints.
Read in the file, line by line. Output the line followed by the line 
number. Sort the output (merge sort works wonderfully). Now you go 
through it a second time and find the maximum distance.

This takes O(n log n) time, which is far from optimal. However, I think 
we already did sorting in external memory, so I could implement this 
much quicker than most alternatives. Depending on the number of times 
this has to be done, that might be a net advantage.



More information about the Digitalmars-d mailing list