[OT] shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 10 04:41:14 PDT 2008
BLS wrote:
> Andrei Alexandrescu schrieb:
>> The high traffic enjoyed by "random k-sample of a file" confirms that
>> plenty of people around here have idle cycles ready to be spent on fun
>> tasks.
>>
>> In that vein, here's another problem. Say you want to randomly shuffle
>> lines in a stream. You don't know the size beforehand, and one pass is
>> preferable. The stream can be larger than the working memory, but not
>> an unbounded amount of times larger. Also lines in the stream can be
>> unbounded, but it is guaranteed that at least a nonzero fraction of
>> the file fits in working memory.
>>
>> You can use temporary files, but of course minimizing their total size
>> is recommended. Speed is of course important. Draft your solution and
>> explain why you think it's correct and advantageous.
>>
>> In case you hate random stuff, just wait; the next problem won't have
>> any random element in it :o).
>>
>>
>> Andrei
>
> Just in general
> I would use a memory mapped file. Map data in chunks, say 4MB. Chunks
> are unmapped as necessary ~LRU, means implement a LRUCache.
>
> This is how some "in memory database-engines" solve that sort of problems.
> Also possible : I miss the real problem :(
I think I overspecified it. Pure and simple you must randomize lines in
a file that may not fit in memory. Note that memory mapping may help the
speed of whichever implementation, but gives zero indication on how you
would actually tackle the problem.
Please add [OT] to the title when answering, as I did, so people can
filter it out. I think these are cool things to discuss nonetheless in
D's context because it gives us ideas on language features and libraries.
Andrei
More information about the Digitalmars-d
mailing list