shuffling lines in a stream
BCS
ao at pathlink.com
Fri Oct 10 14:51:04 PDT 2008
Reply to Andrei,
> BCS wrote:
>
>> Reply to Andrei,
>>
>>> BCS wrote:
>>>
>>>> I don't think there is any way to avoid storing the whole file
>>>> because for a uniform sort there is a possibility that the last
>>>> line will come out first.
>>>>
>>> I agree with the last paragraph, but lseeking seems overly
>>> inefficient. Could you avoid that?
>>>
>>> Andrei
>>>
>> algorithmically, I don't think the lseek will matter,
>>
> I think it does. Essentially you impose random access on the input, or
> copy to a medium that offers it.
>
> gunzip --stdout bigfile.gz | shuffle
>
> You'll have to compulsively store a copy of the input.
You have to anyway, or is that not what you agread with above?
> Besides, random
> access is kind of a dicey proposition on large files. Of course, only
> measurement will show...
Is that not an I/O effect?
>> as to I/O and cache effects, I'll leave that to someone else.
>
> Andrei
>
More information about the Digitalmars-d
mailing list