[dmd-concurrency] shared arrays
Robert Jacques
sandford at jhu.edu
Wed Jan 27 17:02:15 PST 2010
On Wed, 27 Jan 2010 19:19:46 -0500, Michel Fortin
<michel.fortin at michelf.com> wrote:
> Le 2010-01-27 à 18:25, Andrei Alexandrescu a écrit :
>
>> Consider a shared array:
>>
>> shared int[] x;
>>
>> What can be done with it? It's tricky; the array has two words worth of
>> info and we can't assume 128 bits can be atomically written on a 64-bit
>> machine.
>
> Isn't that a great case for a lock pool controlled with a hash table? If
> you have dismissed that option, why?
>
Locking for a simple assignment is costly, doesn't scale well and tends to
turn parallel programs into serial one.
Anyways, the lack of atomic 128-bit loads is a equally troublesome. Not to
mention an atomic read has to be done at every indexing of the var.
I did write down a simple single cas/xchg routine below, but it suffers
from tearing and inconsistencies if/when two or more writers assign to x.
Given:
shared int[] x;
shared int[] y;
x = y;
Then:
if(y.length < x.length) { // shrink then set
x.length = y.length;
x.ptr = y.ptr;
} else { // set then expand
x.ptr = y.ptr;
x.length = y.length;
}
Some searching has found an answer to the issue:
From http://www.kvraudio.com/forum/viewtopic.php?p=3935218:
tony tony chopper wrote:
Quote:
That was true on the 486 but since Pentium all writes/reads up to 64 bits
are guaranteed to be atomic if they are naturally aligned. Words to 2
bytes, DWORDs to 4, QWORDs to 8.
How could this be true? Depending on the compiler, a QWORD read/write will
involve 2 32bit accesses, how could this be atomic?
Indeed, the situation is that a single load/store involving aligned access
(or cache line, but it's easier to just align), will be atomic. If you do
a 16-byte load/store with SSE on aligned address, it's just as atomic (ok,
this is from memory, I didn't double check it, as I don't immediately
recall needing more than 32-bits moved from thread to thread on the fly).
If the compiler splits a 16-byte load/store into 4 32-bit loads/stores,
it's now four loads/stores and not atomic in any way. I think nollock is
thinking on opcode level.
What this means: yes, you can move double (and 16-byte) data around with
atomic loads and stores, but the part about avoiding it still holds,
unless one knows for sure how a particular compiler compiles different
things. Aligned 32-bit load/store at least will always be atomic (for the
current generation of compilers and processors; if they ever remove 32-bit
access from the ISA, then a 32-bit store will probably become
load+modify+store on 64-bit data which is no longer atomic; I find that
rather unlikely though).
So, my read on this is the given all 64-bit x86 CPUs support SSE2, atomic
reads/writes can be done using 128-bit SSE memory ops on aligned data. So
all that's needed is align(16) support in DMD (align(8) would also be
appreciated)
More information about the dmd-concurrency
mailing list