[dmd-concurrency] shared arrays

Wed Jan 27 17:02:15 PST 2010

On Wed, 27 Jan 2010 19:19:46 -0500, Michel Fortin  
<michel.fortin at michelf.com> wrote:

> Le 2010-01-27 à 18:25, Andrei Alexandrescu a écrit :
>
>> Consider a shared array:
>>
>> shared int[] x;
>>
>> What can be done with it? It's tricky; the array has two words worth of  
>> info and we can't assume 128 bits can be atomically written on a 64-bit  
>> machine.
>
> Isn't that a great case for a lock pool controlled with a hash table? If  
> you have dismissed that option, why?
>

Locking for a simple assignment is costly, doesn't scale well and tends to  
turn parallel programs into serial one.

Anyways, the lack of atomic 128-bit loads is a equally troublesome. Not to  
mention an atomic read has to be done at every indexing of the var.

I did write down a simple single cas/xchg routine below, but it suffers  
 from tearing and inconsistencies if/when two or more writers assign to x.

Given:
shared int[] x;
shared int[] y;
x = y;

Then:
if(y.length < x.length) { // shrink then set
    x.length = y.length;
    x.ptr    = y.ptr;
} else { // set then expand
    x.ptr    = y.ptr;
    x.length = y.length;
}

Some searching has found an answer to the issue:

 From http://www.kvraudio.com/forum/viewtopic.php?p=3935218:
tony tony chopper wrote:
Quote:
That was true on the 486 but since Pentium all writes/reads up to 64 bits  
are guaranteed to be atomic if they are naturally aligned. Words to 2  
bytes, DWORDs to 4, QWORDs to 8.
How could this be true? Depending on the compiler, a QWORD read/write will  
involve 2 32bit accesses, how could this be atomic?

Indeed, the situation is that a single load/store involving aligned access  
(or cache line, but it's easier to just align), will be atomic. If you do  
a 16-byte load/store with SSE on aligned address, it's just as atomic (ok,  
this is from memory, I didn't double check it, as I don't immediately  
recall needing more than 32-bits moved from thread to thread on the fly).  
If the compiler splits a 16-byte load/store into 4 32-bit loads/stores,  
it's now four loads/stores and not atomic in any way. I think nollock is  
thinking on opcode level.

What this means: yes, you can move double (and 16-byte) data around with  
atomic loads and stores, but the part about avoiding it still holds,  
unless one knows for sure how a particular compiler compiles different  
things. Aligned 32-bit load/store at least will always be atomic (for the  
current generation of compilers and processors; if they ever remove 32-bit  
access from the ISA, then a 32-bit store will probably become  
load+modify+store on 64-bit data which is no longer atomic; I find that  
rather unlikely though).

So, my read on this is the given all 64-bit x86 CPUs support SSE2, atomic  
reads/writes can be done using 128-bit SSE memory ops on aligned data. So  
all that's needed is align(16) support in DMD (align(8) would also be  
appreciated)