[OT] Best algorithm for extremely large hashtable?
qznc
qznc at web.de
Fri Nov 15 12:03:17 PST 2013
On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote:
> This isn't directly related to D (though the code will be in
> D), and I
> thought this would be a good place to ask.
>
> I'm trying to implement an algorithm that traverses a very
> large graph,
> and I need some kind of data structure to keep track of which
> nodes have
> been visited, that (1) allows reasonably fast lookups
> (preferably O(1)),
> and (2) doesn't require GB's of storage (i.e., some kind of
> compression
> would be nice).
>
> The graph nodes can be represented in various ways, but
> possibly the
> most convenient representation is as n-dimensional vectors of
> (relatively small) integers. Furthermore, graph edges are
> always between
> vectors that differ only by a single coordinate; so the edges
> of the
> graph may be thought of as a subset of the edges of an
> n-dimensional
> grid. The hashtable, therefore, needs to represent some
> connected subset
> of this grid in a space-efficient manner, that still allows
> fast lookup
> times.
>
> The naïve approach of using an n-dimensional bit array is not
> feasible
> because n can be quite large (up to 100 or so), and the size of
> the grid
> itself can get up to about 10 in each direction, so we're
> looking at a
> potential maximum size of 10^100, clearly impractical to store
> explicitly.
So, -10 to 10 in discrete steps. This means 5 bits per dimension
and 500 bits for a single coordinate. Is the graph distributed of
a compute cluster or does it fit into single computer? With a few
GB of RAM, this means your graph is quite sparse, yet nodes are
connected ("differ only by a single coordinate")?
Can you preprocess? I mean, walk all the nodes O(n) to compute a
good (perfect?) hash function.
In general, I think you should either store the flag right in the
graph node or mirror the graph structure.
I do not know any concrete algorithms.
More information about the Digitalmars-d
mailing list