[OT] Best algorithm for extremely large hashtable?

qznc qznc at web.de
Fri Nov 15 12:03:17 PST 2013


On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote:
> This isn't directly related to D (though the code will be in 
> D), and I
> thought this would be a good place to ask.
>
> I'm trying to implement an algorithm that traverses a very 
> large graph,
> and I need some kind of data structure to keep track of which 
> nodes have
> been visited, that (1) allows reasonably fast lookups 
> (preferably O(1)),
> and (2) doesn't require GB's of storage (i.e., some kind of 
> compression
> would be nice).
>
> The graph nodes can be represented in various ways, but 
> possibly the
> most convenient representation is as n-dimensional vectors of
> (relatively small) integers. Furthermore, graph edges are 
> always between
> vectors that differ only by a single coordinate; so the edges 
> of the
> graph may be thought of as a subset of the edges of an 
> n-dimensional
> grid. The hashtable, therefore, needs to represent some 
> connected subset
> of this grid in a space-efficient manner, that still allows 
> fast lookup
> times.
>
> The naïve approach of using an n-dimensional bit array is not 
> feasible
> because n can be quite large (up to 100 or so), and the size of 
> the grid
> itself can get up to about 10 in each direction, so we're 
> looking at a
> potential maximum size of 10^100, clearly impractical to store
> explicitly.

So, -10 to 10 in discrete steps. This means 5 bits per dimension 
and 500 bits for a single coordinate. Is the graph distributed of 
a compute cluster or does it fit into single computer? With a few 
GB of RAM, this means your graph is quite sparse, yet nodes are 
connected ("differ only by a single coordinate")?

Can you preprocess? I mean, walk all the nodes O(n) to compute a 
good (perfect?) hash function.

In general, I think you should either store the flag right in the 
graph node or mirror the graph structure.

I do not know any concrete algorithms.


More information about the Digitalmars-d mailing list