shuffling lines in a stream

BCS ao at pathlink.com
Fri Oct 10 14:51:04 PDT 2008


Reply to Andrei,

> BCS wrote:
> 
>> Reply to Andrei,
>> 
>>> BCS wrote:
>>> 
>>>> I don't think there is any way to avoid storing the whole file
>>>> because for a uniform sort there is a possibility that the last
>>>> line will come out first.
>>>> 
>>> I agree with the last paragraph, but lseeking seems overly
>>> inefficient. Could you avoid that?
>>> 
>>> Andrei
>>> 
>> algorithmically, I don't think the lseek will matter,
>> 
> I think it does. Essentially you impose random access on the input, or
> copy to a medium that offers it.
> 
> gunzip --stdout bigfile.gz | shuffle
> 
> You'll have to compulsively store a copy of the input.

You have to anyway, or is that not what you agread with above?

> Besides, random
> access is kind of a dicey proposition on large files. Of course, only
> measurement will show...

Is that not an I/O effect?

>>  as to I/O and cache effects, I'll leave that to someone else.
> 
> Andrei
> 





More information about the Digitalmars-d mailing list