GC, the simple solution

Wed Jun 14 10:14:44 PDT 2006

Sean Kelly wrote:
> Bruno Medeiros wrote:
>>
>> Ah, I see. I had recently read about relative out-of-order execution 
>> problems in the Memory Barrier wikipedia entry, and I got the 
>> impression (out of nowhere) that the hardware took care of that 
>> transparently (i.e., without program intervention), but then, that is 
>> not the case, not allways at least?
> 
> A single CPU is allowed to do whatever it wants so long as it can fool 
> the user into thinking it's executing instructions in a purely 
> sequential manner.  However, the only information other CPUs in the 
> system have is what they observe on the system bus.  Obviously, so long 
> as CPUs aren't sharing data there aren't any problems.  But things get 
> sticky when this isn't the case.  The memory model defines observable 
> behavior allowed for a given architecture, as well as any methods 
> offered for affecting that behavior.
> 
> Say a specific architecture can operate much more quickly if it is 
> allowed to perform writes to memory (from cache) in order of ascending 
> address rather than in the order they were issued in code.  There's no 
> way to lie to other CPUs about the order in which writes occur and still 
> have the optimization have any effect, so the designers state in the 
> memory model spec that this architecture is allowed to reorder writes 
> and then typically provide some means for overriding this behavior 
> should it prove necessary.
> 
> Now let's suppose you have two threads doing something like this:
> 
>   thread/CPU 1:
> 
>     A = 1;
>     B = 2;
> 
>   thread/CPU 2:
> 
>     if( A == 1 )
>     {
>       assert( B == 2 );
>     }
> 
> Given the order in which a and b were declared, and therefore the order 
> in which the writes occur, this assert may or may not fail.
> 
> Enter memory barriers.  Memory barriers are simply a way for the 
> programmer to tell the CPU "I don't care if your way is faster, A simply 
> must be written to memory before B or thread 2 will explode."  So the 
> CPU behind thread 1 does as it's told at great expense to performance 
> and thread 2 doesn't melt down.
> 
> The sticky part is that hardware designers don't agree with one another 
> on how things should work and they never take the advice of the software 
> people, so all architectures have different sets of observable behavior 
> and different methods for working around it when necessary.  However, 
> the common concept is that memory barriers all constrain the order in 
> which memory accesses may occur with respect to each other.  Think of it 
> as an assertion that "X may not occur before Y" or "X may not occur 
> after Y" at the instruction level.
> 
> The x86 is actually a bit weird in this regard as it has no formal 
> memory barriers for normal operations (though it has the FENCE 
> instructions for SSE use).  I think this is largely for historical 
> reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
> this mattered, and the memory model has always been fairly strict (it 
> was actually sequential until not terribly long ago).  Also, the LOCK 
> instruction acts as a heavy-handed sort of memory barrier as well, so 
> there has been little motivation to add new instructions for 
> finer-grained control.
> 
> 
> Sean

Nice post!
Makes me think, how does one keep up with this? I mean, one who isn't 
(nor wishes to be) a hardware expert, but wants to keep up with the 
general developments in this area, thus maintaining an overview of it.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D