Multicores and Publication Safety

Tue Aug 5 10:41:12 PDT 2008

Jb wrote:
> "Sean Kelly" <sean at invisibleduck.org> wrote in message 
> news:g79ugv$mdd$1 at digitalmars.com...
>> Jb wrote:
>>> "Sean Kelly" <sean at invisibleduck.org> wrote in message 
>>> news:g78man$17sb$1 at digitalmars.com...
>>>> Jb wrote:
>>>>> "Walter Bright" <newshound1 at digitalmars.com> wrote in message 
>>>>> news:g7855a$2sd3$1 at digitalmars.com...
>>>>>> "What memory fences are useful for on multiprocessors; and why you 
>>>>>> should care, even if you're not an assembly programmer."
>>>>>>
>>>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>>>
>>>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>>>> None of that is relevant on x86 as far as I understand. I could only 
>>>>> find the one regarding x86-64, but as far as I know it's the same on 
>>>>> x86-32.
>>>>>
>>>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>>>
>>>>> The key point being loads are not reordered with other loads, and 
>>>>> stores are not reordered with other stores.
>>>> Not true.  The actual behavior of IA-32 processors has been hotly 
>>>> debated, but it's been established that at least certain AMD processors 
>>>> may reorder loads.
>>> Thats news to me.
>> I don't know that this was ever confirmed with anyone at AMD, but it did 
>> come up in the C++0x talks and I believe the linux kernel accounts for it.
> 
> I did a bit of googling and it does seem older AMDs were less strongly 
> ordered. It seems SSE/3DNow non temporal stores particulary. But it looks 
> like they have gone for strong ordering with AMD64.
> 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> 
> From 7.2 : Multiprocessor Memory Ordering.
> 
> "Loads do not pass previous loads (loads are not re-ordered). Stores do not 
> pass previous stores
> (stores are not re-ordered)"
> 
> Although skim reading more of chapter 7 it looks like they might do 
> reordering behind the scence, or "such that the appearance of in-order 
> execution is maintained" as they say.

At least AMD and Intel have figured out how to separate discussion of 
implementation issues with visible behavior.  The original IA-32 spec 
was an absolute disaster in this respect.  I'm also encouraged that the 
memory model has been both fully specified and strengthened to PCsc or 
better.  The x86 has always been pretty easy to deal with and it's nice 
to see that this will continue to be true.  I suppose my only question 
at this point is how the official memory barrier instructions apply to 
normal (non-SSE) instruction ordering.  I don't suppose the recent specs 
say anything about this?

> My guess is that strong ordering, or at least the appearance of it, is an 
> important factor in multi core cpus scalling well.

Yup.  And the Intel announcement makes the very good point that it's a 
huge factor in performance per watt as well.  Strengthening the memory 
model and shrinking the pipeline allows for a tremendous amount of logic 
hardware to simply be thrown away, which means smaller, cooler, more 
energy-efficient CPUs.  My big question now is how computers will be 
built in the coming years... will we have a few traditional (fast) cores 
plus a general-purpose parallel computing cluster?  I suppose I should 
read that Intel paper posted yesterday.

Sean