Compiler optimization breaks multi-threaded code

stephan none at example.com
Tue Nov 16 14:11:16 PST 2010


Am 16.11.2010 18:09, schrieb Sean Kelly:
> cas() contains an asm block. Though I guess in this case the compiler
> isn't actually optimizing across it. Does atomic!"+="(&cnt, 1) work
> correctly?  I know the issue with shared would still have to be fixed,
> but that code uses asm for the load as well, so it probably won't be
> optimized the same way.

Thanks for looking into the issue around here. Just three comments from 
my side, Sean.

Disclaimer: based on a couple of hours chasing a bug and not much D 
experience (but some optimizing C++ compiler experience - so the issue 
looked familiar :-) )

1) atomicOp is not concerned. You only read memory once in the function 
call. Whether from a local variable that was loaded from something 
global or directly from a global, doesn't really matter (except for 
timing, maybe).

2) You are right, the compiler seems to not optimize across asm 
statements. So, the example can be fixed with the following hack:
     void atomicInc  ( ) {
         uint o;
         while ( !cas( &cnt, o, o + 1 ) ) {
             asm { nop; } o = cnt;
         }
     }
This is however more brittle than it looks, because it is not always 
clear what "optimizing across an asm block". This version has the issue 
again:
     void atomicInc  ( ) {
         uint o = cnt;
         do {
             asm { nop; } o = cnt;
         } while ( !cas( &cnt, o, o + 1 ) )
     }
While this case might look somewhat obvious, I encountered some problems 
in more complex code, and finally went for the all-inline-assembler 
solution to be on the safe side.

3) During my debugging, I believe that I saw the optimizer not only 
re-ordering reads of shared variables, but also writes to shared 
variables. IIRC, my Dekker example on SO (which fails for the missing 
s/l/mfence instructions), also sports a re-ordering of the lines
         cnt++;
         turn2 = true; flag1 = false;
into
         turn2 = true;
         cnt++;
         flag1 = false;
which in this case is not really important, but might introduce another 
bug if I was prepared to live with the risk of starvation (and remove 
turn2). If the compiler would still re-order (haven't tested), cnt++ 
would be outside of the critical section.

Hope this helps & cheers,
Stephan


More information about the Digitalmars-d mailing list