[OT] Best algorithm for extremely large hashtable?

Fri Nov 15 11:20:20 PST 2013

On Fri, Nov 15, 2013 at 10:52:15PM +0400, Dmitry Olshansky wrote:
> 15-Nov-2013 22:41, H. S. Teoh пишет:
> >This isn't directly related to D (though the code will be in D), and
> >I thought this would be a good place to ask.
> >
> >I'm trying to implement an algorithm that traverses a very large
> >graph, and I need some kind of data structure to keep track of which
> >nodes have been visited, that (1) allows reasonably fast lookups
> >(preferably O(1)),
> 
> Store as a bit-flag in whatever structure that serves as node. It's
> going to be the fastest anyway and you indeed get to use only 1 bit
> per node (and given padding and whatnot you may already have a couple
> of bytes to spare per node).
[...]

There are too many nodes to keep in memory, actually. The algorithm also
doesn't need to store the nodes; it can easily generate them on-the-fly
when it needs them.  The challenge is when it needs to know whether a
particular generated node has been visited before. I'd like to be able
to have a fast lookup (hopefully O(1)) that tells me whether node X has
been seen before. The problem is, there may have been a very large
number of previously-seen nodes, so I need some way of storing this
information in a space-efficient way.

I'm hoping for a kind of compressed representation where multiple
adjacent nodes can be represented in a very compact way, by somehow
taking advantage of the fact that they lie on an n-dimensional grid and
the fact that graph edges are always a subset of grid edges (as opposed
to edges between two arbitrary nodes).

T

-- 
I think Debian's doing something wrong, `apt-get install pesticide', doesn't seem to remove the bugs on my system! -- Mike Dresser