[dmd-concurrency] word tearing status in today's processors

Wed Jan 27 13:42:18 PST 2010

Thanks, Robert. This is very useful!

Andrei

Robert Jacques wrote:
> On Wed, 27 Jan 2010 10:10:49 -0500, Andrei Alexandrescu 
> <andrei at erdani.com> wrote:
> 
>> Hello,
>>
>>
>> I'm looking _hard data_ on how today's processors address word 
>> tearing. As usual, googling for word tearing yields the usual mix of 
>> vague information, folklore, and opinionated newsgroup discussions.
>>
>> In particular:
>>
>> a) Can we assume that all or most of today's processors are able to 
>> write memory at byte level?
> 
> Not sure. Both x86 and ARM seem to have set byte instructions.
> 
>> b) If not, is it reasonable to have the compiler insert for sub-word 
>> shared assignments a call to a function that avoids word tearing by 
>> means of a CAS loop?
> 
> Yes, in general, though on x86 xchg (not CAS) should be used instead.
> 
>> c) For 64-bit data (long and double), am I right in assuming that all 
>> non-ancient Intel32 processors do offer a means to atomically assign 
>> 64-bit data? (What are those asm instructions?) For processors that 
>> don't (Intel or not), can we/should we guarantee at the language level 
>> that 64-bit writes are atomic? We could effect that by using e.g. a 
>> federation of hashed locks, or even (gasp!) two global locks, one for 
>> long and one for double, and do something cleverer when public outrage 
>> puts our lives in danger. Java guarantees atomic assignment for 
>> volatile data, but I'm not sure what mechanisms implementations use.
> 
> The instructions you're looking for is CMPXCHG8B for 32-bit x86 CPUs. 
> It's been around since the 486. For other CPUs, they generally use a 
> linked-load. From wikipedia:
> All of Alpha, PowerPC, MIPS, and ARM have LL/SC instructions: 
> ldl_l/stl_c and ldq_l/stq_c (Alpha), lwarx/stwcx (PowerPC), ll/sc 
> (MIPS), and ldrex/strex (ARM version 6 and above).
> 
> Most platforms provide multiple sets of instructions for different data 
> sizes, e.g. ldarx/stdcx for doubleword on the PowerPC.
> Some CPUs require the address being accessed exclusively to be 
> configured in write-through mode.
> Some CPUs track the load-linked address at a cache-line or other 
> granularity, such that any modification to any portion of the cache line 
> (whether via another core's store-conditional or merely by an ordinary 
> store) is sufficient to cause the store-conditional to fail.
> All of these platforms provide weak LL/SC. The PowerPC implementation is 
> the strongest, allowing an LL/SC pair to wrap loads and even stores to 
> other cache lines. This allows it to implement, for example, lock-free 
> reference counting in the face of changing object graphs with arbitrary 
> counter reuse (which otherwise requires DCAS).
> 
> And from an ARM website (STREXD is 64-bit):
> ARM LDREX and STREX are available in ARMv6 and above.
> ARM LDREXB, LDREXH, LDREXD, STREXB, STREXD, and STREXH are available in 
> ARMv6K and above.
> All these 32-bit Thumb instructions are available in ARMv6T2 and above, 
> except that LDREXD and STREXD are not available in the ARMv7-M profile.
> 
> ARM also has had a swap-byte instruction since v4, which may/may not be 
> equivalent to LDREXB/STREXB.
> 
> So I think it's safe to say that 64-bit writes will be efficient on most 
> CPUs out there and making a language level guarantee is okay.
> 
> Warning: most of this came from some quick Google searches, so I don't 
> know if there's other gotchas out there.
> 
>>
>> Thanks,
>>
>> Andrei
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
> 
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency