shuffling lines in a stream

BCS ao at pathlink.com
Thu Oct 9 22:38:12 PDT 2008


Reply to Andrei,

> BCS wrote:
> 
>> Reply to Andrei,
>> 
>>> You don't know the size beforehand, and
>>> one pass is preferable. The stream can be larger than the working
>>> memory, but not an unbounded amount of times larger. Also lines in
>>> the stream can be unbounded, but it is guaranteed that at least a
>>> nonzero fraction of the file fits in working memory.
>> I got the bit about the end effect. The above is what I didn't get.
>> 
> Don't worry about that for now.
> 
> Andrei
> 

cache the whole file into a tmp file
store slice data on the way through (start/length of each line)
shuffle that set
lseek/read/write each slice in order

Quick'n'dirty

The only fun part is how to shuffle the deck. What I think might be easiest 
to do would be to qsort it with rand as the compare function.

I don't think there is any way to avoid storing the whole file because for 
a uniform sort there is a possibility that the last line will come out first.





More information about the Digitalmars-d mailing list