shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 14:56:48 PDT 2008


BCS wrote:
> Reply to Andrei,
> 
>> BCS wrote:
>>
>>> Reply to Andrei,
>>>
>>>> BCS wrote:
>>>>
>>>>> I don't think there is any way to avoid storing the whole file
>>>>> because for a uniform sort there is a possibility that the last
>>>>> line will come out first.
>>>>>
>>>> I agree with the last paragraph, but lseeking seems overly
>>>> inefficient. Could you avoid that?
>>>>
>>>> Andrei
>>>>
>>> algorithmically, I don't think the lseek will matter,
>>>
>> I think it does. Essentially you impose random access on the input, or
>> copy to a medium that offers it.
>>
>> gunzip --stdout bigfile.gz | shuffle
>>
>> You'll have to compulsively store a copy of the input.
> 
> You have to anyway, or is that not what you agread with above?

Well I agree things can be coded such that you make the copy only if the 
file is too large anyway. But one lseek call per line is a complete 
disaster, whether you call it I/O effect or whatnot. I'd go as far as 
saying I'd never even try such a solution, especially since there exists 
a forward-pass algorithm. Of course measuring would be a great way to 
prove me wrong.

Andrei



More information about the Digitalmars-d mailing list