[OT] Efficient file structure for very large lookup tables?

Wed Dec 18 06:03:20 PST 2013

On Tuesday, 17 December 2013 at 20:50:23 UTC, H. S. Teoh wrote:
> On Tue, Dec 17, 2013 at 08:47:17PM +0100, Craig Dillabaugh 
> wrote:

>> Another question is can your queries be batched?  If that is 
>> the
>> case and your data is bigger than your available memory, then 
>> try
>> Googling "Lars Arge Buffer Tree" which might work well.
>
> Hmm. I'm not sure if it's possible to batch the queries unless I
> multithread the algorithm; the queries are generated 
> on-the-fly. But
> they *are* somewhat predictable (spatial locality: see below), 
> so maybe
> that might help?
>
I should clarify a bit what I meant by batched.  If you have a
sequence of queries q_1, q_2, .... q_n (where I guess in your
case a query can be/is an insertion), then by 'batch' proccess
what I mean is you do not need to get the answers on the fly. It
is sufficient to have all queries answered when the program
terminates?

Note that these techniques still account for the 'order' of the
queries, so the answer to query q_i will be reported correctly
(ie. same as if the queries were performed sequentially on a
standard data structure).

Also, if you are still considering using hashing the following
paper may be of interest to you (I haven't read it in detail, but
it doesn't look too crazy)

http://arxiv.org/abs/1104.2799