Something needs to happen with shared, and soon.

Wed Nov 14 07:14:50 PST 2012

On 14-11-2012 15:50, deadalnix wrote:
> Le 14/11/2012 15:39, Alex Rønne Petersen a écrit :
>> On 14-11-2012 15:14, Andrei Alexandrescu wrote:
>>> On 11/14/12 1:19 AM, Walter Bright wrote:
>>>> On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
>>>>> Being able to have double-checked locking work would be valuable, and
>>>>> having
>>>>> memory barriers would reduce race condition weirdness when locks
>>>>> aren't used
>>>>> properly, so I think that it would be desirable to have memory
>>>>> barriers.
>>>>
>>>> I'm not saying "memory barriers are bad". I'm saying that having the
>>>> compiler blindly insert them for shared reads/writes is far from the
>>>> right way to do it.
>>>
>>> Let's not hasten. That works for Java and C#, and is allowed in C++.
>>>
>>> Andrei
>>>
>>>
>>
>> I need some clarification here: By memory barrier, do you mean x86's
>> mfence, sfence, and lfence? Because as Walter said, inserting those
>> blindly when unnecessary can lead to terrible performance because it
>> practically murders pipelining.
>>
>
> In fact, x86 is mostly sequentially consistent due to its memory model.
> It only require an mfence when an shared store is followed by a shared
> load.

I just used x86's fencing instructions as an example because most people 
here are familiar with it. The problem is much, much bigger on 
architectures like ARM, MIPS, and PowerPC which are not in-order.

>
> See : http://g.oswego.edu/dl/jmm/cookbook.html for more information on
> the barrier required on different architectures.
>
>> (And note that you can't optimize this either; since the dependencies
>> memory barriers are supposed to express are subtle and not detectable by
>> a compiler, the compiler would always have to insert them because it
>> can't know when it would be safe not to.)
>>
>
> Compiler is aware of what is thread local and what isn't. It means the
> compiler can fully optimize TL store and load (like doing register
> promotion or reorder them across shared store/load).

Thread-local loads and stores are not atomic and thus do not take part 
in the reordering constraints that atomic operations impose. See e.g. 
the LLVM docs for atomicrmw and atomic load/store.

>
> This have a cost, indeed, but is useful, and Walter's solution to cast
> away shared when a mutex is acquired is always available.

-- 
Alex Rønne Petersen
alex at lycus.org
http://lycus.org