shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 10 14:56:48 PDT 2008
BCS wrote:
> Reply to Andrei,
>
>> BCS wrote:
>>
>>> Reply to Andrei,
>>>
>>>> BCS wrote:
>>>>
>>>>> I don't think there is any way to avoid storing the whole file
>>>>> because for a uniform sort there is a possibility that the last
>>>>> line will come out first.
>>>>>
>>>> I agree with the last paragraph, but lseeking seems overly
>>>> inefficient. Could you avoid that?
>>>>
>>>> Andrei
>>>>
>>> algorithmically, I don't think the lseek will matter,
>>>
>> I think it does. Essentially you impose random access on the input, or
>> copy to a medium that offers it.
>>
>> gunzip --stdout bigfile.gz | shuffle
>>
>> You'll have to compulsively store a copy of the input.
>
> You have to anyway, or is that not what you agread with above?
Well I agree things can be coded such that you make the copy only if the
file is too large anyway. But one lseek call per line is a complete
disaster, whether you call it I/O effect or whatnot. I'd go as far as
saying I'd never even try such a solution, especially since there exists
a forward-pass algorithm. Of course measuring would be a great way to
prove me wrong.
Andrei
More information about the Digitalmars-d
mailing list