[OT] Efficient file structure for very large lookup tables?

Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang at gmail.com> Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang at gmail.com>
Tue Dec 17 13:26:15 PST 2013


On Tuesday, 17 December 2013 at 20:54:56 UTC, H. S. Teoh wrote:
> big it got before I killed the process). Suffice it to say that 
> this is
> a combinatorial problem, so the number of entries grow 
> exponentially;
> anything that can help reduce the storage requirements / I/O 
> latency
> would be a big help.

If you can buffer the queries in a queue then you can issue a 
prefetch-request to the OS to bring in the memory-page from disk 
when you put it into the queue to prevent the process from being 
put to sleep, the length of the queue has to be tailored to how 
fast it can load the page.

If the data is uniformly distributed then perhaps you could 
partition the disk-space with a n-dimensional grid, and then have 
a key-value store that you page into memory?

If you can do some queries out of order then you probably should 
set up some "buckets"/"bins" in the queue and prefetch the page 
that is referenced by the fullest/oldest bucket if you can do 
some queries out of order. Just to avoid that pages are pushed in 
and out of memory all the time.

Otherwise a kd-tree with a scaled grid per leaf that stays within 
a memorypage, probably could work, but that sounds like a lot of 
work. Implementing n-dimensional datastructures is conceptually 
easy, but tedious in reality (can you trust it to be bug free? It 
is hard to visualize in text.).

If you have to do the spatial datastructure yourself, then 
perhaps an octree or n-dimensional oc-tree would be helpful. You 
could make it shallow and in memory and use it both to buffer 
queries and to store indexes to disk-data. It would be 2^N nodes 
per level. (2D data has 4 nodes, 3D has 8 nodes)

I once read a paper of mapping GIS data to a regular database 
(mapping 2D to 1D) using hilbert curves. Not sure if that is of 
any help.

Hm.


More information about the Digitalmars-d mailing list