[dmd-concurrency] draft 7

Fawzi Mohamed fawzi at gmx.ch
Tue Feb 2 06:15:09 PST 2010


On 2-feb-10, at 15:04, Fawzi Mohamed wrote:

>
> On 2-feb-10, at 07:47, Sean Kelly wrote:
>
>> On Feb 1, 2010, at 10:24 PM, Andrei Alexandrescu wrote:
>>>
>>> If you or I don't know of any machine that would keep on reading a  
>>> datum from its own cache instead of consulting the main memory,  
>>> we're making an assumption. The right thing to do is to keep the  
>>> compiler _AND_ the machine in the loop by using appropriate fencing.
>>
>> I did some googling before my last reply to try and find a NUMA  
>> architecture that works this way, but I didn't have any luck.   
>> Cache coherency is just too popular these days.  But it seems  
>> completely reasonable to include such an assertion in a standard as  
>> broadly applicable as POSIX.  I'm not sure I agree that fencing is  
>> always necessary though--we're really operating at the level of  
>> "assumptions regarding hardware behavior" here.
>
> My imaginary hardware model is the following:
>
> several processors, each have a separate cache.
> Operations on the cache are kept in a kind of journal and  
> communicated to other processors.
> A processor continuously updates its cache and sends its updates to  
> the other processors (for me it makes no sense to skip this, if you  
> do it than you don't have a shared memory system).
> It might be that different processors see work of other processors  
> delayed or out of order, actually most likely it is so (for  
> performance reasons).
>
> * barriers
> A write barrier ensures that all writes done on the cache of  
> processor X (where the barrier was issued) are communicated to the  
> caches of other processors *before* any subsequent write.
> A read barrier insures that all reads done on processor Y (where the  
> barrier was issued) before the barrier are completed before any read  
> after the barrier.
> Using them one can ensure the if one before the barrier Y sees a  
> change t that was done after the write barrier X, all changes done  
> before the write barrier X are visible after the read barrier Y.
> Thus barriers can introduce a weak ordering.
> That is all they do, they don't synchronize, or ensure that a change  
> becomes visible, with the time all changes become visible anyhow,  
> otherwise you are not working on a shared memory machine (NUMA or  
> not).
>
> * atomic load/stores
> atomic load or stores don't really change much, but ensure that a  
> change is done at once, their cost (if the hardware supports them)  
> is typically very small, and often if supported for a given size it  
> is always used (64 bit on 32 processors is an exception).
>
> * atomic update
> this is something much stronger, basically it ensures that an update  
> is immediately globally seen, to make its cost reasonable it regards  
> just one memory location (some work has to be done so that the  
> update is perceived as immediate.
> Atomic update a priory does not imply any kind of barrier for other  
> memory, only the updated memory location is synchronized.
> Atomic operations can be used to implement locks, unique counters,  
> when used in combination with barriers.
>
> That is about it, you will not find this hardware definition  
> somewhere, but that is my conceptual model, and I think it can be  
> applied to all hardware that I know, and, I think, to the future  
> hardware too.
>
> You can see that barriers don't imply immediate synchronization  
> (unlike atomic updates) but synchronize *all* changes done, itanium  
> was toying with the idea of of more local barriers, that synchronize  
> only part of the memory, but that is difficult to program for (the  
> compiler could probably use them in some cases), and I have ignored  
> it.
>
> Anyway here is my understanding of the situation, and I think it is  
> pretty good.
>
> By the way I found the atomic module in tango difficult to use  
> correctly (maybe I had not understood it), and I rewrote it.
> Due to the recent reorganization in tango an older (buggy) version  
> was resurrected, and I don't know what will happen now, probably the  
> new one will be merged in, but if you are interested the latest  
> version is in blip:
> 	http://github.com/fawzi/blip/blob/master/blip/sync/Atomic.d
> it is apache 2.0, but I am willing to relicense it to whatever if  
> deemed useful
>
> As note I would like to say that atomic updates while useful to  
> implement some operations in my opinion should not be the method of  
> choice for parallel programming (difficult to compose), still in D  
> they should definitely be usable by who wants them (i.e. I want a  
> working volatile,... at least in D1.0 ;).
>
> Fawzi

Please note that the wording of some atomic operations is quite  
strange, but if you can use them for spinlocks or similar then the  
update has to be "effectively" instantaneous, an atomic operation  
local to a single processor is not really useful...


More information about the dmd-concurrency mailing list