[OT] Efficient file structure for very large lookup tables?
Joseph Cassman
jc7919 at outlook.com
Tue Dec 17 11:43:17 PST 2013
On Tuesday, 17 December 2013 at 19:09:49 UTC, H. S. Teoh wrote:
> Another OT thread to pick your brains. :)
>
> What's a good, efficient file structure for storing extremely
> large
> lookup tables? (Extremely large as in > 10 million entries,
> with keys
> and values roughly about 100 bytes each.) The structure must
> support
> efficient adding and lookup of entries, as these two operations
> will be
> very frequent.
>
> I did some online research, and it seems that hashtables
> perform poorly
> on disk, because the usual hash functions cause random
> scattering of
> related data (which are likely to be access with higher temporal
> locality), which incurs lots of disk seeks.
>
> I thought about B-trees, but they have high overhead (and are a
> pain to
> implement), and also only exhibit good locality if table
> entries are
> accessed sequentially; the problem is I'm working with
> high-dimensional
> data and the order of accesses is unlikely to be sequential.
> However,
> they do exhibit good spatial locality in higher-dimensional
> space (i.e.,
> if entry X is accessed first, then the next entry Y is quite
> likely to
> be close to X in that space). Does anybody know of a good data
> structure that can take advantage of this fact to minimize disk
> accesses?
>
>
> T
A burst trie might work, although it is originally designed for
text. I haven't had a chance to try one out in code yet but the
paper by Heinz, Zobel, and Williams is interesting.
Burst Tries: A Fast, Efficient Data Structure for String Keys
(2002)
http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499
Joseph
More information about the Digitalmars-d
mailing list