Compiler optimization breaks multi-threaded code
stephan
none at example.com
Tue Nov 16 14:11:16 PST 2010
Am 16.11.2010 18:09, schrieb Sean Kelly:
> cas() contains an asm block. Though I guess in this case the compiler
> isn't actually optimizing across it. Does atomic!"+="(&cnt, 1) work
> correctly? I know the issue with shared would still have to be fixed,
> but that code uses asm for the load as well, so it probably won't be
> optimized the same way.
Thanks for looking into the issue around here. Just three comments from
my side, Sean.
Disclaimer: based on a couple of hours chasing a bug and not much D
experience (but some optimizing C++ compiler experience - so the issue
looked familiar :-) )
1) atomicOp is not concerned. You only read memory once in the function
call. Whether from a local variable that was loaded from something
global or directly from a global, doesn't really matter (except for
timing, maybe).
2) You are right, the compiler seems to not optimize across asm
statements. So, the example can be fixed with the following hack:
void atomicInc ( ) {
uint o;
while ( !cas( &cnt, o, o + 1 ) ) {
asm { nop; } o = cnt;
}
}
This is however more brittle than it looks, because it is not always
clear what "optimizing across an asm block". This version has the issue
again:
void atomicInc ( ) {
uint o = cnt;
do {
asm { nop; } o = cnt;
} while ( !cas( &cnt, o, o + 1 ) )
}
While this case might look somewhat obvious, I encountered some problems
in more complex code, and finally went for the all-inline-assembler
solution to be on the safe side.
3) During my debugging, I believe that I saw the optimizer not only
re-ordering reads of shared variables, but also writes to shared
variables. IIRC, my Dekker example on SO (which fails for the missing
s/l/mfence instructions), also sports a re-ordering of the lines
cnt++;
turn2 = true; flag1 = false;
into
turn2 = true;
cnt++;
flag1 = false;
which in this case is not really important, but might introduce another
bug if I was prepared to live with the risk of starvation (and remove
turn2). If the compiler would still re-order (haven't tested), cnt++
would be outside of the critical section.
Hope this helps & cheers,
Stephan
More information about the Digitalmars-d
mailing list