Are Gigantic Associative Arrays Now Possible?

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Fri Mar 24 10:48:35 PDT 2017


On Fri, Mar 24, 2017 at 04:59:00PM +0000, dlangPupil via Digitalmars-d wrote:
[...]
> In addition to prevalence of random lookups, certain additional app
> conditions and designs might make "gigantic AAs" more useful or
> competitive with alternatives:
> 
> 1.  Use by programmers who need a data structure that's easier to
> reason about or faster and safer to implement or prototype, and easier
> for future users to read and maintain.

This point is arguable, since a B tree is pretty well-known and thus not
difficult to reason about. Generally, whether you use a hashtable of a B
tree or something else, the high-level view is that it's some opaque
data structure that, given some key K, returns some value V.  Which
underlying algorithm is used is an implementation detail.

And generally as far as application programmers are concerned, you
wouldn't want to implement an advanced data structure from scratch;
you'd use an off-the-shelf library that has already implemented it.


> 2.  Apps with large tables that use (or could use) non-natural keys,
> e.g., crypto-quality GUIDs, which can't benefit from (or impose the
> extra computational cost of) the sorting that must be performed and
> maintained on natural key values (e.g., LastName="Smith") in order to
> use a B-Tree search.

Encrypted GUIDs is a very good point.  You probably can't expect much
locality from that.

But this point, and the following, presume a rather specific use case,
i.e., document retrieval, presumably over the network. The application
of hashtables and B-trees, however, is far wider in scope.  In
particular, what I had in mind when I asked about cache speeds is
compute-intensive (as opposed to I/O intensive) processes. These two
domains of application are quite different in some ways.

In document retrieval, for example, if you view it as a server serving
documents over the network to a set of clients, having 1 I/O roundtrip
per lookup is probably acceptable, because the time it takes to send the
document over the network to the client is large enough that making
lookups any faster probably won't make too much of a difference.
Besides, if client requests are essentially unpredictable, then 1 I/O
roundtrip is about the best you could do.

In CPU-intensive computations, however, the cache hierarchy becomes very
prominent, because you're talking about CPU speeds vs. RAM speeds, which
is currently a rather large gap. Being cache-friendly could mean a
difference on the scale of orders of magnitude in terms of overall
performance.  If you're talking about data lookups inside a
compute-intensive inner loop, having 1 I/O roundtrip per iteration is
unacceptable.  Even 1 L1 cache miss per iteration may be unacceptable,
depending on the application.  Ideally, you want to maximize the number
of iterations per L1 cache miss, and then on the next level, squeeze as
many L1 misses onto L2 as possible before you have to go to L3 or revert
to RAM, because RAM is slow, relatively speaking, even if it's still
very fast compared to an I/O roundtrip to the hard disk.  In other
words, you want as much locality as you can possibly get in your memory
accesses -- the entire cache hierarchy, after all, is built on the
premise of memory accesses exhibiting locality.

Furthermore, when it comes to CPU-intensive applications, quite often
you already know the pattern of lookups beforehand, since it's usually
coming from within the application itself, not from external clients
that may be completely unpredictable.  So you can usually predict the
next n memory accesses much more easily, and take advantage of it by
organizing your data structure around that.


[...]
> Finally, re: caches: I haven't found whether it is or isn't possible
> to combine a server's DRAM and its Optane SSD RAM or DIMMs to form a
> single pool of RAM.  Mixing DIMMS of different speeds is a no-no; but
> if this could be done, then the hottest data could be "cached" in DRAM
> and thus spared the 10x latency penalty of the SSD device.

If the SSD device incurs a 10x latency penalty, that's still a far cry
from L1/L2 cache latencies!  True, it will help in the worst case when
you have no choice but to take an I/O roundtrip (a hard drive would be
far slower)... but this would hardly be anything revolutionary if DRAM
is still faster.  At the end of the day, you're still talking about a
memory hierarchy with L1/L2 at the top, L3 and RAM in between, and hard
drive / SSD at the bottom.  If so, then the cache-unfriendliness of
hashtables still applies, and we still have to work with finding ways of
improving hashtable locality, e.g., with locality-sensitive hashing,
cache-oblivious hashtables, etc., or use a more cache-friendly data
structure like B-trees or one of the newer cache-oblivious trees.

(In my case, though, B-trees may not represent much of an improvement,
because I'm dealing with high-dimensional data that cannot be easily
linearized to take maximum advantage of B-tree locality. So at some
level I still need some kind of hash-like structure to work with my
data. But it will probably have some tree-like structure to it, because
of the (high-dimensional) locality it exhibits.)


T

-- 
Without geometry, life would be pointless. -- VS


More information about the Digitalmars-d mailing list