[dmd-concurrency] word tearing status in today's processors
Andrei Alexandrescu
andrei at erdani.com
Wed Jan 27 13:42:18 PST 2010
Thanks, Robert. This is very useful!
Andrei
Robert Jacques wrote:
> On Wed, 27 Jan 2010 10:10:49 -0500, Andrei Alexandrescu
> <andrei at erdani.com> wrote:
>
>> Hello,
>>
>>
>> I'm looking _hard data_ on how today's processors address word
>> tearing. As usual, googling for word tearing yields the usual mix of
>> vague information, folklore, and opinionated newsgroup discussions.
>>
>> In particular:
>>
>> a) Can we assume that all or most of today's processors are able to
>> write memory at byte level?
>
> Not sure. Both x86 and ARM seem to have set byte instructions.
>
>> b) If not, is it reasonable to have the compiler insert for sub-word
>> shared assignments a call to a function that avoids word tearing by
>> means of a CAS loop?
>
> Yes, in general, though on x86 xchg (not CAS) should be used instead.
>
>> c) For 64-bit data (long and double), am I right in assuming that all
>> non-ancient Intel32 processors do offer a means to atomically assign
>> 64-bit data? (What are those asm instructions?) For processors that
>> don't (Intel or not), can we/should we guarantee at the language level
>> that 64-bit writes are atomic? We could effect that by using e.g. a
>> federation of hashed locks, or even (gasp!) two global locks, one for
>> long and one for double, and do something cleverer when public outrage
>> puts our lives in danger. Java guarantees atomic assignment for
>> volatile data, but I'm not sure what mechanisms implementations use.
>
> The instructions you're looking for is CMPXCHG8B for 32-bit x86 CPUs.
> It's been around since the 486. For other CPUs, they generally use a
> linked-load. From wikipedia:
> All of Alpha, PowerPC, MIPS, and ARM have LL/SC instructions:
> ldl_l/stl_c and ldq_l/stq_c (Alpha), lwarx/stwcx (PowerPC), ll/sc
> (MIPS), and ldrex/strex (ARM version 6 and above).
>
> Most platforms provide multiple sets of instructions for different data
> sizes, e.g. ldarx/stdcx for doubleword on the PowerPC.
> Some CPUs require the address being accessed exclusively to be
> configured in write-through mode.
> Some CPUs track the load-linked address at a cache-line or other
> granularity, such that any modification to any portion of the cache line
> (whether via another core's store-conditional or merely by an ordinary
> store) is sufficient to cause the store-conditional to fail.
> All of these platforms provide weak LL/SC. The PowerPC implementation is
> the strongest, allowing an LL/SC pair to wrap loads and even stores to
> other cache lines. This allows it to implement, for example, lock-free
> reference counting in the face of changing object graphs with arbitrary
> counter reuse (which otherwise requires DCAS).
>
> And from an ARM website (STREXD is 64-bit):
> ARM LDREX and STREX are available in ARMv6 and above.
> ARM LDREXB, LDREXH, LDREXD, STREXB, STREXD, and STREXH are available in
> ARMv6K and above.
> All these 32-bit Thumb instructions are available in ARMv6T2 and above,
> except that LDREXD and STREXD are not available in the ARMv7-M profile.
>
> ARM also has had a swap-byte instruction since v4, which may/may not be
> equivalent to LDREXB/STREXB.
>
> So I think it's safe to say that 64-bit writes will be efficient on most
> CPUs out there and making a language level guarantee is okay.
>
> Warning: most of this came from some quick Google searches, so I don't
> know if there's other gotchas out there.
>
>>
>> Thanks,
>>
>> Andrei
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
More information about the dmd-concurrency
mailing list