shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 23:08:28 PDT 2008


BCS wrote:
> Reply to Andrei,
> 
>> BCS wrote:
>>
>>> Reply to Andrei,
>>>
>>>> You don't know the size beforehand, and
>>>> one pass is preferable. The stream can be larger than the working
>>>> memory, but not an unbounded amount of times larger. Also lines in
>>>> the stream can be unbounded, but it is guaranteed that at least a
>>>> nonzero fraction of the file fits in working memory.
>>> I got the bit about the end effect. The above is what I didn't get.
>>>
>> Don't worry about that for now.
>>
>> Andrei
>>
> 
> cache the whole file into a tmp file
> store slice data on the way through (start/length of each line)
> shuffle that set
> lseek/read/write each slice in order
> 
> Quick'n'dirty
> 
> The only fun part is how to shuffle the deck. What I think might be 
> easiest to do would be to qsort it with rand as the compare function.
> 
> I don't think there is any way to avoid storing the whole file because 
> for a uniform sort there is a possibility that the last line will come 
> out first.

I agree with the last paragraph, but lseeking seems overly inefficient. 
Could you avoid that?

Andrei



More information about the Digitalmars-d mailing list