Are Gigantic Associative Arrays Now Possible?

Thu Mar 23 00:12:29 PDT 2017

On Wednesday, 22 March 2017 at 20:00:56 UTC, dlangPupil wrote:
> Hello to all!  As a (D, and coding and forum) newbie I'm 
> learning about D's associative arrays (AAs), and their tiny 
> latency regardless of AA size. Cool!
>
> One practical limitation on the practical maximum AA size is 
> memory/disk paging.  But maybe this limit could be overcome 
> with the latest SSDs, whose nonvolatile memory can be addressed 
> like RAM.
>
> The article below says that Intel Optane SSDs:
> 	-allow reads and writes can on individual bytes.
> 	-have a latency 10x of DRAM (but  AAs' latency is so low that 
> this might not matter in many cases).
> 	-currently offer 375GB of "RAM" for $1,500.
> 	-will support up to 3 TB on 2 socket Xeon systems (48TB on 
> 4-socket).
> 	-will be supplemented with Optane DIMMs in the future.
>
> Some questions that arise are...
>
> 1) Wouldn't using such "RAM" eliminate any paging issue for 
> super-gigantic AAs?
> 2) What other bottlenecks could arise for gigantic AAs, e.g., 
> garbage collection?
> 3) Would an append-only data design mitigate GC or other 
> bottlenecks?
> 4) Has anyone tried this out?
>
> What a coup if D could "be the first" lang to make this 
> practical.  Thanks.
>
> https://arstechnica.com/information-technology/2017/03/intels-first-optane-ssd-375gb-that-you-can-also-use-as-ram/

Hi.

I am very interested in this topic, although I don't really have 
answers for your questions.

See the acm article which I mentioned last year I think on forum.
https://queue.acm.org/detail.cfm?id=2874238
"For the entire careers of most practicing computer scientists, a 
fundamental observation has consistently held true: CPUs are 
significantly more performant and more expensive than I/O 
devices. The fact that CPUs can process data at extremely high 
rates, while simultaneously servicing multiple I/O devices, has 
had a sweeping impact on the design of both hardware and software 
for systems of all sizes, for pretty much as long as we've been 
building them.

This assumption, however, is in the process of being completely 
invalidated.

The arrival of high-speed, non-volatile storage devices, 
typically referred to as Storage Class Memories (SCM), is likely 
the most significant architectural change that datacenter and 
software designers will face in the foreseeable future. SCMs are 
increasingly part of server systems, and they constitute a 
massive change: the cost of an SCM, at $3-5k, easily exceeds that 
of a many-core CPU ($1-2k), and the performance of an SCM 
(hundreds of thousands of I/O operations per second) is such that 
one or more entire many-core CPUs are required to saturate it.

This change has profound effects:

1. The age-old assumption that I/O is slow and computation is 
fast is no longer true: this invalidates decades of design 
decisions that are deeply embedded in today's systems.

2. The relative performance of layers in systems has changed by a 
factor of a thousand times over a very short time: this requires 
rapid adaptation throughout the systems software stack.

3. Piles of existing enterprise datacenter 
infrastructure—hardware and software—are about to become useless 
(or, at least, very inefficient): SCMs require rethinking the 
compute/storage balance and architecture from the ground up.

"

It's a massive relative price shock - storage vs CPU - and whole 
structures around it will need to be utterly transformed.  Intel 
say it's the biggest technological breakthrough since the 
internet.

I'm good at recognising moments of exhaustion in old trends and 
the beginning of new ones, and I've thought for some time now 
that people were complacently squandering CPU performance just as 
conditions were tending to favour it becoming important again.

https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow/answer/Laeeth-Isharc?srid=35gE

I don't know how data structures and file systems should adapt to 
this.  But I do think the prospective return on efficient code 
just went up a lot - as far as I can see, this shift is very good 
for D.