[OT] Best algorithm for extremely large hashtable?

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Nov 15 10:41:57 PST 2013


This isn't directly related to D (though the code will be in D), and I
thought this would be a good place to ask.

I'm trying to implement an algorithm that traverses a very large graph,
and I need some kind of data structure to keep track of which nodes have
been visited, that (1) allows reasonably fast lookups (preferably O(1)),
and (2) doesn't require GB's of storage (i.e., some kind of compression
would be nice).

The graph nodes can be represented in various ways, but possibly the
most convenient representation is as n-dimensional vectors of
(relatively small) integers. Furthermore, graph edges are always between
vectors that differ only by a single coordinate; so the edges of the
graph may be thought of as a subset of the edges of an n-dimensional
grid. The hashtable, therefore, needs to represent some connected subset
of this grid in a space-efficient manner, that still allows fast lookup
times.

The naïve approach of using an n-dimensional bit array is not feasible
because n can be quite large (up to 100 or so), and the size of the grid
itself can get up to about 10 in each direction, so we're looking at a
potential maximum size of 10^100, clearly impractical to store
explicitly.

Are there any known good algorithms for tackling this problem?

Thanks!


T

-- 
Creativity is not an excuse for sloppiness.


More information about the Digitalmars-d mailing list