Who Ordered Memory Fences on an x86?

Thu Nov 6 01:03:03 PST 2008

Nick Sabalausky wrote:
> "Walter Bright" <newshound1 at digitalmars.com> wrote in message 
> news:geu161$91s$1 at digitalmars.com...
>> Nick Sabalausky wrote:
>>> Call me a grumpy old fart, but I'd be happy just tossing fences in 
>>> everywhere (when a multicore is detected) and be done with the whole 
>>> mess, just because trying to wring every little bit of speed from, say, a 
>>> 3+ GHz multicore processor strikes me as a highly unworthy pursuit. I'd 
>>> rather optimize for the lower end and let the fancy overpriced crap 
>>> handle it however it will.
>>>
>>> And that's even before tossing in the consideration that (to my dismay) 
>>> most code these days is written in languages/platforms (ex, "Ajaxy" 
>>> web-apps) that throw any notion of performance straight into the trash 
>>> anyway (what's 100 extra cycles here and there, when the 
>>> browser/interpreter/OS/whatever makes something as simple as navigation 
>>> and text entry less responsive than it was on a 1MHz 6502?).
>> Bartosz, Andrei, Sean and I have discussed this at length. My personal 
>> view is that nobody actually understands the proper use of fences (the CPU 
>> documentation on exactly what they do is frustratingly obtuse, which does 
>> not help at all). Then there's the issue of fences behaving very 
>> differently on different CPUs. If you use explicit fences, you have no 
>> hope of portability.
> 
> From reading the article, I was under the impression that not using explicit 
> fences lead to CPUs inevitably making false assumptions and thus spitting 
> out erroneus results. So it sounds like explicit fences are a case of 
> "dammed if you do, dammed if you don't": ie, "Use explicit fences everywhere 
> and you get unportable machine code. Don't use explicit fences and you get 
> errors." Is this accurate? (If so, what a mess!) Also, one thing I'ma little 
> nclear on, is this whole mess only applicable when multiple cores are in 
> use, or do the same problems crop up on unicore chips?

In theory, you can write a portable program that uses explicit fences. 
The problem is that you have to design for the absolute worst-case CPU, 
which basically means putting an absolute read/write fence in almost 
every conceivable place.  That would make performance suck.

For reasonable performance, you need to cut down on the fences to only 
those that are required in order to get correctness...and that is 
definitely *not* portable.