[dmd-concurrency] draft 7

Mon Feb 1 22:47:45 PST 2010

On Feb 1, 2010, at 10:24 PM, Andrei Alexandrescu wrote:
> 
> If you or I don't know of any machine that would keep on reading a datum from its own cache instead of consulting the main memory, we're making an assumption. The right thing to do is to keep the compiler _AND_ the machine in the loop by using appropriate fencing.

I did some googling before my last reply to try and find a NUMA architecture that works this way, but I didn't have any luck.  Cache coherency is just too popular these days.  But it seems completely reasonable to include such an assertion in a standard as broadly applicable as POSIX.  I'm not sure I agree that fencing is always necessary though--we're really operating at the level of "assumptions regarding hardware behavior" here.

>> It is wrong that a cache of a processor might be never updated, it will be updated at some point, and a barrier (in general) will not force the update to be performed immediately.
> 
> According to my reading of Butenhof's text, there is nothing stopping a machine from delaying reads indefinitely if you don't insert a barrier.

This is certainly a useful way to think about concurrent code--it makes problems a lot easier to find.  But I think it's a bit difficult to apply at times.  x86, for example, still really only provides the LOCK prefix as an ad-hoc memory barrier--it isn't even explicitly for this purpose, and it's often more heavyweight than necessary.  The only way to perform a load-acquire or a store-release on x86 is to just do a plain old MOV.  It's really too bad that RC (release consistency) never caught on, the model is very programmer-oriented.

>> This issue is a priory disconnected with the presence of barriers (well one would hope that a correct compiler disables "upgrading to registers" for variables in a locked section crossing the boundary of it, but a priory it doesn't have to be like that, and especially using just memory barriers, I would not trust current compilers to always do the correct thing without a "volatile".
> 
> Agreed. There must be a special construct understood by the compiler to (a) prevent reordering and enregistering and (b) insert the appropriate barrier instruction. One without the other is useless.

I think there's definite value in just being able to constrain compiler optimization, since it's necessary for anyone who wants to implement a concurrency library.  As long as the compiler is required to treat inline asm as effectively volatile then we're in decent shape on platforms the compiler supports inline asm for though.  For others I guess we'll have to trust that no optimization would occur across opaque function calls?  I recall this sort of thing coming up in the C++ 0x memory model talks, and I wish I had a clearer recollection of what came out of it... or that Hans Boehm were a part of this conversation.