[OT] Best algorithm for extremely large hashtable?

Fri Nov 15 12:16:11 PST 2013

On Friday, 15 November 2013 at 19:21:35 UTC, H. S. Teoh wrote:
> On Fri, Nov 15, 2013 at 10:52:15PM +0400, Dmitry Olshansky 
> wrote:
>> 15-Nov-2013 22:41, H. S. Teoh пишет:
>> >This isn't directly related to D (though the code will be in 
>> >D), and
>> >I thought this would be a good place to ask.
>> >
>> >I'm trying to implement an algorithm that traverses a very 
>> >large
>> >graph, and I need some kind of data structure to keep track 
>> >of which
>> >nodes have been visited, that (1) allows reasonably fast 
>> >lookups
>> >(preferably O(1)),
>> 
>> Store as a bit-flag in whatever structure that serves as node. 
>> It's
>> going to be the fastest anyway and you indeed get to use only 
>> 1 bit
>> per node (and given padding and whatnot you may already have a 
>> couple
>> of bytes to spare per node).
> [...]
>
> There are too many nodes to keep in memory, actually. The 
> algorithm also
> doesn't need to store the nodes; it can easily generate them 
> on-the-fly
> when it needs them.  The challenge is when it needs to know 
> whether a
> particular generated node has been visited before. I'd like to 
> be able
> to have a fast lookup (hopefully O(1)) that tells me whether 
> node X has
> been seen before. The problem is, there may have been a very 
> large
> number of previously-seen nodes, so I need some way of storing 
> this
> information in a space-efficient way.
>
> I'm hoping for a kind of compressed representation where 
> multiple
> adjacent nodes can be represented in a very compact way, by 
> somehow
> taking advantage of the fact that they lie on an n-dimensional 
> grid and
> the fact that graph edges are always a subset of grid edges (as 
> opposed
> to edges between two arbitrary nodes).

A Trie might be an idea, especially the bitwise variant.

http://en.wikipedia.org/wiki/Trie