<p dir="ltr">On 8 Feb 2014 01:20, "Marco Leise" <<a href="mailto:Marco.Leise@gmx.de">Marco.Leise@gmx.de</a>> wrote:<br>

><br>

> Am Fri, 7 Feb 2014 18:42:29 +0000<br>

> schrieb Iain Buclaw <<a href="mailto:ibuclaw@gdcproject.org">ibuclaw@gdcproject.org</a>>:<br>

><br>

> > On 7 Feb 2014 15:45, "Sean Kelly" <<a href="mailto:sean@invisibleduck.org">sean@invisibleduck.org</a>> wrote:<br>

> > ><br>

> > > On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:<br>

> > >><br>

> > >> On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:<br>

> > >>><br>

> > >>> Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and<br>

> > atomicStore(raw) should be the same as atomicStore(rel).  At least on x86.<br>

> >  I don't know why that change made a difference in performance.<br>

> > >><br>

> > >><br>

> > >> huh?<br>

> > >><br>

> > >> --8<-- core/atomic.d<br>

> > >><br>

> > >>         template needsLoadBarrier( MemoryOrder ms )<br>

> > >>         {<br>

> > >>             enum bool needsLoadBarrier = ms != MemoryOrder.raw;<br>

> > >>         }<br>

> > >><br>

> > >> -->8--<br>

> > >><br>

> > >> Didn't you write this? :)<br>

> > ><br>

> > ><br>

> > > Oops.  I thought that since Intel has officially defined loads as having<br>

> > acquire semantics, I had eliminated the barrier requirement there.  But I<br>

> > guess not.  I suppose it's an issue worth discussing.  Does anyone know<br>

> > offhand what C++0x implementations do for load acquires on x86?<br>

> ><br>

> > Speaking of which, I need to add 'Update gcc.atomics to use new C++0x<br>

> > intrinsics' to the GDCProjects page - they map closely to what core.atomic<br>

> > is doing, and should see better performance compared to the __sync<br>

> > intrinsics.  :)<br>

><br>

> You send shared variables as "volatile" to the backend and<br>

> that is correct. I wonder since that should create strong<br>

> ordering of memory operations (correct?), if DMD has something<br>

> similar, or if D's "shared" isn't really shared at alĺ and<br>

> relies entirely on the correct use of atomicLoad/atomicStore<br>

> and atomicFence. In that case, would the GCC backend be able to<br>

> optimize more around shared variables (by not considering them<br>

> volatile) and still be no worse off than DMD?<br>

></p>

<p dir="ltr">No. The fact that I decided shared data be marked volatile was *not* because of a strong ordering. Remember, we follow C semantics here, which is quite specific in not guaranteeing this.</p>

<p dir="ltr">The reason it is set as volatile, is that it (instead) guarantees the compiler will not generate code that explicitly cache the shared data.</p>