GC, the simple solution

Tue Jun 13 19:40:07 PDT 2006

Bruno Medeiros wrote:
> 
> Ah, I see. I had recently read about relative out-of-order execution 
> problems in the Memory Barrier wikipedia entry, and I got the impression 
> (out of nowhere) that the hardware took care of that transparently 
> (i.e., without program intervention), but then, that is not the case, 
> not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool 
the user into thinking it's executing instructions in a purely 
sequential manner.  However, the only information other CPUs in the 
system have is what they observe on the system bus.  Obviously, so long 
as CPUs aren't sharing data there aren't any problems.  But things get 
sticky when this isn't the case.  The memory model defines observable 
behavior allowed for a given architecture, as well as any methods 
offered for affecting that behavior.

Say a specific architecture can operate much more quickly if it is 
allowed to perform writes to memory (from cache) in order of ascending 
address rather than in the order they were issued in code.  There's no 
way to lie to other CPUs about the order in which writes occur and still 
have the optimization have any effect, so the designers state in the 
memory model spec that this architecture is allowed to reorder writes 
and then typically provide some means for overriding this behavior 
should it prove necessary.

Now let's suppose you have two threads doing something like this:

   thread/CPU 1:

     A = 1;
     B = 2;

   thread/CPU 2:

     if( A == 1 )
     {
       assert( B == 2 );
     }

Given the order in which a and b were declared, and therefore the order 
in which the writes occur, this assert may or may not fail.

Enter memory barriers.  Memory barriers are simply a way for the 
programmer to tell the CPU "I don't care if your way is faster, A simply 
must be written to memory before B or thread 2 will explode."  So the 
CPU behind thread 1 does as it's told at great expense to performance 
and thread 2 doesn't melt down.

The sticky part is that hardware designers don't agree with one another 
on how things should work and they never take the advice of the software 
people, so all architectures have different sets of observable behavior 
and different methods for working around it when necessary.  However, 
the common concept is that memory barriers all constrain the order in 
which memory accesses may occur with respect to each other.  Think of it 
as an assertion that "X may not occur before Y" or "X may not occur 
after Y" at the instruction level.

The x86 is actually a bit weird in this regard as it has no formal 
memory barriers for normal operations (though it has the FENCE 
instructions for SSE use).  I think this is largely for historical 
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
this mattered, and the memory model has always been fairly strict (it 
was actually sequential until not terribly long ago).  Also, the LOCK 
instruction acts as a heavy-handed sort of memory barrier as well, so 
there has been little motivation to add new instructions for 
finer-grained control.

Sean