shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 9 23:08:28 PDT 2008
BCS wrote:
> Reply to Andrei,
>
>> BCS wrote:
>>
>>> Reply to Andrei,
>>>
>>>> You don't know the size beforehand, and
>>>> one pass is preferable. The stream can be larger than the working
>>>> memory, but not an unbounded amount of times larger. Also lines in
>>>> the stream can be unbounded, but it is guaranteed that at least a
>>>> nonzero fraction of the file fits in working memory.
>>> I got the bit about the end effect. The above is what I didn't get.
>>>
>> Don't worry about that for now.
>>
>> Andrei
>>
>
> cache the whole file into a tmp file
> store slice data on the way through (start/length of each line)
> shuffle that set
> lseek/read/write each slice in order
>
> Quick'n'dirty
>
> The only fun part is how to shuffle the deck. What I think might be
> easiest to do would be to qsort it with rand as the compare function.
>
> I don't think there is any way to avoid storing the whole file because
> for a uniform sort there is a possibility that the last line will come
> out first.
I agree with the last paragraph, but lseeking seems overly inefficient.
Could you avoid that?
Andrei
More information about the Digitalmars-d
mailing list