shuffling lines in a stream

BLS nanali at nospam-wanadoo.fr
Thu Oct 9 22:23:41 PDT 2008


Andrei Alexandrescu schrieb:
> The high traffic enjoyed by "random k-sample of a file" confirms that 
> plenty of people around here have idle cycles ready to be spent on fun 
> tasks.
> 
> In that vein, here's another problem. Say you want to randomly shuffle 
> lines in a stream. You don't know the size beforehand, and one pass is 
> preferable. The stream can be larger than the working memory, but not an 
> unbounded amount of times larger. Also lines in the stream can be 
> unbounded, but it is guaranteed that at least a nonzero fraction of the 
> file fits in working memory.
> 
> You can use temporary files, but of course minimizing their total size 
> is recommended. Speed is of course important. Draft your solution and 
> explain why you think it's correct and advantageous.
> 
> In case you hate random stuff, just wait; the next problem won't have 
> any random element in it :o).
> 
> 
> Andrei

Just in general
I would use a memory mapped file. Map data in chunks, say 4MB. Chunks 
are unmapped as necessary ~LRU, means implement a LRUCache.

This is how some "in memory database-engines" solve that sort of problems.
Also possible : I miss the real problem :(
Bjoern



More information about the Digitalmars-d mailing list