Multicores and Publication Safety

Brad Roberts braddr at puremagic.com
Mon Aug 4 22:33:02 PDT 2008


Sean Kelly wrote:
> Brad Roberts wrote:
>> Jb wrote:
>>> "Walter Bright" <newshound1 at digitalmars.com> wrote in message
>>> news:g7855a$2sd3$1 at digitalmars.com...
>>>> "What memory fences are useful for on multiprocessors; and why you
>>>> should care, even if you're not an assembly programmer."
>>>>
>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>
>>>>
>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>> None of that is relevant on x86 as far as I understand. I could only
>>> find the one regarding x86-64, but as far as I know it's the same on
>>> x86-32.
>>>
>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>
>>> The key point being loads are not reordered with other loads, and
>>> stores are not reordered with other stores.
>>>
>>
>> Pay very close attention to sections 2.3 and 2.4 of that document.
> 
> 2.4 is the most interesting aspect of PC.  It means that you can run
> into situations like this:
> 
>     // Thread A
>     x = 1;
> 
>     // Thread B
>     if( x == 1 )
>         y = 1;
> 
>     // Thread C
>     if( y == 1 )
>         assert( x == 1 ); // may fail
> 
> Alex Terekhov came up with a sneaky solution for this based on how the
> IA-32 spec says CAS is currently implemented:
> 
>     // Thread A
>     x = 1;
> 
>     // Thread B
>     t = CAS( x, 0, 0 );
>     if( t == 1 )
>         y = 1;
> 
>     // Thread C
>     if( y == 1 )
>         assert( x == 1 ); // true
> 
> In essence, Intel currently implements CAS by either storing the new
> value /or/ re-storing the old value based on the result of the
> comparison, and because all stores from a single processor are ordered,
> Thread C is therefore guaranteed to see the store to x before the store
> to y.
> 
> As cool as I find the above solution, however, I do hope that this helps
> to demonstrate the complexity of lock-free programming.  It also shows
> just how complex analysis of this stuff is.  Even with the full source
> code available it would take some doing for a compiler to recognize a
> problem similar to the above.
> 
> 
> Sean

For that example, section 2.8 kicks in, locked instructions (such as
CAS) help constrain ordering.

So.. summary.  Reordering is real, even on x86 class hardware.  To make
life even more interesting, there's also various cpu bugs that help make
things even worse.  See this thread (unconfirmed info, but interesting
non-the-less) on the linux-kernel mailing list:

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.0/0882.html

Whee,
Brad



More information about the Digitalmars-d mailing list