DMD 1.029 and 2.013 releases

Thu Apr 24 15:43:31 PDT 2008

Sean Kelly wrote:
> == Quote from Russell Lewis (webmaster at villagersonline.com)'s article
>> Sean Kelly wrote:
>>> 1) The cost of acquiring or committing a lock is generally roughly equivalent to
>>>      a memory synchronization, and sometimes less than that (futexes, etc).  So
>>>      it's not insignificant, but also not as bad as people seem to think.  I suspect
>>>      that locked operations are often subject to premature optimization.
>> What exactly do you mean by "memory synchronization?"  Just a write
>> barrier instruction, or something else?
>> If what you mean is a write barrier, then what you said isn't
>> necessarily true, especially as we head toward more and more cores, and
>> thus more and more caches.  Locks are almost always atomic
>> read/modify/write operations, and those can cause terrible cache
>> bouncing problems.  If you have N cores (each with its own cache) race
>> for the same lock (even if they are trying to get shared locks), you can
>> have up to N^2 bounces of the cache line around.
> 
> Yeah I meant an atomic RMW, or at least a load barrier for the acquire.  Releasing
> a mutex can often be done using a plain old store though, since write ops are
> typically ordered anyway and moving loads up into the mutex doesn't break
> anything.  My point, however, was simply that mutexes aren't terribly slower than
> atomic operations, since a mutex acquire/release is really little more than an atomic
> operation itself, at least in the simple case.

Ah, now I get what you were saying.  Yes, I agree that atomic 
instructions are not likely to be much faster than mutexes.  (Ofc, 
pthread mutexes, when they sleep, are a whole 'nother beast.)

What I thought you were referring to were barriers, which are (in the 
many-cache case) *far* faster than atomic operations.  Which is why I 
disagreed, in my previous post.