shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 14:23:27 PDT 2008


BCS wrote:
> Reply to Andrei,
> 
>> BCS wrote:
>>
> 
>>> I don't think there is any way to avoid storing the whole file
>>> because for a uniform sort there is a possibility that the last line
>>> will come out first.
>>>
>> I agree with the last paragraph, but lseeking seems overly
>> inefficient. Could you avoid that?
>>
>> Andrei
>>
> 
> algorithmically, I don't think the lseek will matter,

I think it does. Essentially you impose random access on the input, or 
copy to a medium that offers it.

gunzip --stdout bigfile.gz | shuffle

You'll have to compulsively store a copy of the input. Besides, random 
access is kind of a dicey proposition on large files. Of course, only 
measurement will show...


Andrei



More information about the Digitalmars-d mailing list