[OT] Best algorithm for extremely large hashtable?
Vladimir Panteleev
vladimir at thecybershadow.net
Fri Nov 15 11:29:22 PST 2013
On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote:
> This isn't directly related to D (though the code will be in
> D), and I
> thought this would be a good place to ask.
>
> I'm trying to implement an algorithm that traverses a very
> large graph,
> and I need some kind of data structure to keep track of which
> nodes have
> been visited, that (1) allows reasonably fast lookups
> (preferably O(1)),
> and (2) doesn't require GB's of storage (i.e., some kind of
> compression
> would be nice).
A while ago I set out to write a solver for a group of problems
which can be described as traversing in breath extremely large
implicit graphs. Some code here (C++):
https://github.com/CyberShadow/DDD. The project uses delayed
duplicate detection, to allow the number of nodes to exceed
available RAM.
What we found was that in certain cases, delayed duplicate
detection beat hash tables even while filtering duplicates in
memory, I think mainly because DDD is more parallelizable than
hash tables.
You mentioned compression - perhaps a bloom filter will fit your
needs, as an optimization?
More information about the Digitalmars-d
mailing list