[OT] The horizon of a stream
Christopher Wright
dhasenan at gmail.com
Sat Nov 1 10:54:33 PDT 2008
Andrei Alexandrescu wrote:
> Consider some text stream. We define the horizon of the stream as the
> longest number of lines between two identical lines minus one, or zero
> if the stream consists of unique lines.
>
> For example, this stream has horizon 9:
>
> lorem
> ipsum
> dolor
> habeas
> corpus
> ipsum
> sit
> amet
> rigor
> mortis
> consectetur
> coitus
> interruptus
> corpus
> adipisicing
> elit
>
> This is because the word "corpus" occurs on line 5 and line 14. This is
> the farthest two lines in the file are; notice that "ipsum" also occurs
> twice, but at a smaller distance.
>
> The challenge, should you blah blah, is to compute the horizon of a
> potentially large file. Of course theoretical and practical speed are as
> important as always.
>
> This is a task that's interesting and subtle in quite a number of ways,
> and it allows for many creative solutions. Therefore I intentionally did
> not restrict it much. Let ideas flow!
>
>
> Andrei
Simplicity considering external memory constraints.
Read in the file, line by line. Output the line followed by the line
number. Sort the output (merge sort works wonderfully). Now you go
through it a second time and find the maximum distance.
This takes O(n log n) time, which is far from optimal. However, I think
we already did sorting in external memory, so I could implement this
much quicker than most alternatives. Depending on the number of times
this has to be done, that might be a net advantage.
More information about the Digitalmars-d
mailing list