[OT] Efficient file structure for very large lookup tables?
Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang at gmail.com>
Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang at gmail.com>
Tue Dec 17 13:26:15 PST 2013
On Tuesday, 17 December 2013 at 20:54:56 UTC, H. S. Teoh wrote:
> big it got before I killed the process). Suffice it to say that
> this is
> a combinatorial problem, so the number of entries grow
> exponentially;
> anything that can help reduce the storage requirements / I/O
> latency
> would be a big help.
If you can buffer the queries in a queue then you can issue a
prefetch-request to the OS to bring in the memory-page from disk
when you put it into the queue to prevent the process from being
put to sleep, the length of the queue has to be tailored to how
fast it can load the page.
If the data is uniformly distributed then perhaps you could
partition the disk-space with a n-dimensional grid, and then have
a key-value store that you page into memory?
If you can do some queries out of order then you probably should
set up some "buckets"/"bins" in the queue and prefetch the page
that is referenced by the fullest/oldest bucket if you can do
some queries out of order. Just to avoid that pages are pushed in
and out of memory all the time.
Otherwise a kd-tree with a scaled grid per leaf that stays within
a memorypage, probably could work, but that sounds like a lot of
work. Implementing n-dimensional datastructures is conceptually
easy, but tedious in reality (can you trust it to be bug free? It
is hard to visualize in text.).
If you have to do the spatial datastructure yourself, then
perhaps an octree or n-dimensional oc-tree would be helpful. You
could make it shallow and in memory and use it both to buffer
queries and to store indexes to disk-data. It would be 2^N nodes
per level. (2D data has 4 nodes, 3D has 8 nodes)
I once read a paper of mapping GIS data to a regular database
(mapping 2D to 1D) using hilbert curves. Not sure if that is of
any help.
Hm.
More information about the Digitalmars-d
mailing list