shuffling lines in a stream
BCS
ao at pathlink.com
Thu Oct 9 22:38:12 PDT 2008
Reply to Andrei,
> BCS wrote:
>
>> Reply to Andrei,
>>
>>> You don't know the size beforehand, and
>>> one pass is preferable. The stream can be larger than the working
>>> memory, but not an unbounded amount of times larger. Also lines in
>>> the stream can be unbounded, but it is guaranteed that at least a
>>> nonzero fraction of the file fits in working memory.
>> I got the bit about the end effect. The above is what I didn't get.
>>
> Don't worry about that for now.
>
> Andrei
>
cache the whole file into a tmp file
store slice data on the way through (start/length of each line)
shuffle that set
lseek/read/write each slice in order
Quick'n'dirty
The only fun part is how to shuffle the deck. What I think might be easiest
to do would be to qsort it with rand as the compare function.
I don't think there is any way to avoid storing the whole file because for
a uniform sort there is a possibility that the last line will come out first.
More information about the Digitalmars-d
mailing list