[dmd-concurrency] draft 7
Fawzi Mohamed
fawzi at gmx.ch
Tue Feb 2 12:52:34 PST 2010
On 2-feb-10, at 18:50, Sean Kelly wrote:
> On Feb 2, 2010, at 6:04 AM, Fawzi Mohamed wrote:
>>
>> My imaginary hardware model is the following:
>>
>> several processors, each have a separate cache.
>> Operations on the cache are kept in a kind of journal and
>> communicated to other processors.
>> A processor continuously updates its cache and sends its updates to
>> the other processors (for me it makes no sense to skip this, if you
>> do it than you don't have a shared memory system).
>> It might be that different processors see work of other processors
>> delayed or out of order, actually most likely it is so (for
>> performance reasons).
>
> The weird part is that a processor may see the work of another
> processor out of order because of its own load reordering rather
> than because the stores were issued out of order. I think this is
> why Bartosz has said that you need a barrier at both the read and
> write locations, and I guess this is the handshake Andrei mentioned.
yes indeed the barriers you typically need are the ones I describe
later, and I suppose it was what Andrei meant with handshake
>> * barriers
>> A write barrier ensures that all writes done on the cache of
>> processor X (where the barrier was issued) are communicated to the
>> caches of other processors *before* any subsequent write.
>> A read barrier insures that all reads done on processor Y (where
>> the barrier was issued) before the barrier are completed before any
>> read after the barrier.
>
> I'd call these a hoist-store barrier and a hoist-load barrier (Alex
> Terekhov's terminology). SPARC would call them a LoadStore and a
> LoadLoad barrier, I believe. I can never keep the SPARC terminology
> straight because they use "load" to represent both an actual load
> from memory and something about what type of code movement the
> barrier prevents, so the first word is one and the second word is
> the other. Saying that a load or store has acquire semantics is a
> stronger guarantee because it constrains the movement of both loads
> and stores.
>> * atomic load/stores
>> atomic load or stores don't really change much, but ensure that a
>> change is done at once, their cost (if the hardware supports them)
>> is typically very small, and often if supported for a given size it
>> is always used (64 bit on 32 processors is an exception).
>
> I've always considered atomic operations to only guarantee that the
> operation happens as an indivisible unit. It may still be delayed,
> and perhaps longer than an op with a barrier since the CPU could
> reorder operations to occur before it in some cases. For example, a
> MOV instruction on x86 is atomic but there's no barrier.
indeed
>> By the way I found the atomic module in tango difficult to use
>> correctly (maybe I had not understood it), and I rewrote it.
>
> What problems did you have?
yes, I don't remember exactly, but I found the terminology unfamiliar
to me (hoist-store barrier and hoist-load), and I did not see the two
barriers I wanted (basically the one you tipically need are (store-
store, and load-load, order stores wrt. stores and order loads wrt
loads, what I call write and read barriers).
Typically I want either a flag or something like that, and I like the
barriers to be "hidden" in the operation I call.
I don't want to have to think if the barriers has to be before or
after the operation each time I call, some seemed at the wrong place
to me.
When writing a flag for example the write barrier is before writing
the flag, whereas when reading the read barrier goes after reading the
flag.
Whereas the old module had this (opaque to me) terminology and you
sort of had to pass it to each call, furthermore the barrier
operations themselves looked wrong to me (lock is not enough).
It is fully possible that I simply did not understand how it was
supposed to work, in any case I find (clearly ;) my version more
clear, and I can use it correctly.
Fawzi
More information about the dmd-concurrency
mailing list