<p dir="ltr"><br>

On 7 Feb 2014 15:45, "Sean Kelly" <<a href="mailto:sean@invisibleduck.org">sean@invisibleduck.org</a>> wrote:<br>

><br>

> On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:<br>

>><br>

>> On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:<br>

>>><br>

>>> Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and atomicStore(raw) should be the same as atomicStore(rel).  At least on x86.  I don't know why that change made a difference in performance.<br>


>><br>

>><br>

>> huh?<br>

>><br>

>> --8<-- core/atomic.d<br>

>><br>

>>         template needsLoadBarrier( MemoryOrder ms )<br>

>>         {<br>

>>             enum bool needsLoadBarrier = ms != MemoryOrder.raw;<br>

>>         }<br>

>><br>

>> -->8--<br>

>><br>

>> Didn't you write this? :)<br>

><br>

><br>

> Oops.  I thought that since Intel has officially defined loads as having acquire semantics, I had eliminated the barrier requirement there.  But I guess not.  I suppose it's an issue worth discussing.  Does anyone know offhand what C++0x implementations do for load acquires on x86?</p>


<p dir="ltr">Speaking of which, I need to add 'Update gcc.atomics to use new C++0x intrinsics' to the GDCProjects page - they map closely to what core.atomic is doing, and should see better performance compared to the __sync intrinsics.  :)</p>