[dmd-concurrency] word tearing status in today's processors
Robert Jacques
sandford at jhu.edu
Wed Jan 27 08:47:46 PST 2010
On Wed, 27 Jan 2010 10:10:49 -0500, Andrei Alexandrescu
<andrei at erdani.com> wrote:
> Hello,
>
>
> I'm looking _hard data_ on how today's processors address word tearing.
> As usual, googling for word tearing yields the usual mix of vague
> information, folklore, and opinionated newsgroup discussions.
>
> In particular:
>
> a) Can we assume that all or most of today's processors are able to
> write memory at byte level?
Not sure. Both x86 and ARM seem to have set byte instructions.
> b) If not, is it reasonable to have the compiler insert for sub-word
> shared assignments a call to a function that avoids word tearing by
> means of a CAS loop?
Yes, in general, though on x86 xchg (not CAS) should be used instead.
> c) For 64-bit data (long and double), am I right in assuming that all
> non-ancient Intel32 processors do offer a means to atomically assign
> 64-bit data? (What are those asm instructions?) For processors that
> don't (Intel or not), can we/should we guarantee at the language level
> that 64-bit writes are atomic? We could effect that by using e.g. a
> federation of hashed locks, or even (gasp!) two global locks, one for
> long and one for double, and do something cleverer when public outrage
> puts our lives in danger. Java guarantees atomic assignment for volatile
> data, but I'm not sure what mechanisms implementations use.
The instructions you're looking for is CMPXCHG8B for 32-bit x86 CPUs. It's
been around since the 486. For other CPUs, they generally use a
linked-load. From wikipedia:
All of Alpha, PowerPC, MIPS, and ARM have LL/SC instructions: ldl_l/stl_c
and ldq_l/stq_c (Alpha), lwarx/stwcx (PowerPC), ll/sc (MIPS), and
ldrex/strex (ARM version 6 and above).
Most platforms provide multiple sets of instructions for different data
sizes, e.g. ldarx/stdcx for doubleword on the PowerPC.
Some CPUs require the address being accessed exclusively to be configured
in write-through mode.
Some CPUs track the load-linked address at a cache-line or other
granularity, such that any modification to any portion of the cache line
(whether via another core's store-conditional or merely by an ordinary
store) is sufficient to cause the store-conditional to fail.
All of these platforms provide weak LL/SC. The PowerPC implementation is
the strongest, allowing an LL/SC pair to wrap loads and even stores to
other cache lines. This allows it to implement, for example, lock-free
reference counting in the face of changing object graphs with arbitrary
counter reuse (which otherwise requires DCAS).
And from an ARM website (STREXD is 64-bit):
ARM LDREX and STREX are available in ARMv6 and above.
ARM LDREXB, LDREXH, LDREXD, STREXB, STREXD, and STREXH are available in
ARMv6K and above.
All these 32-bit Thumb instructions are available in ARMv6T2 and above,
except that LDREXD and STREXD are not available in the ARMv7-M profile.
ARM also has had a swap-byte instruction since v4, which may/may not be
equivalent to LDREXB/STREXB.
So I think it's safe to say that 64-bit writes will be efficient on most
CPUs out there and making a language level guarantee is okay.
Warning: most of this came from some quick Google searches, so I don't
know if there's other gotchas out there.
>
> Thanks,
>
> Andrei
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
More information about the dmd-concurrency
mailing list