Is core.internal.atomic.atomicFetchAdd implementation really lock free?

Sun Dec 4 12:39:44 UTC 2022

On Sunday, 4 December 2022 at 10:15:48 UTC, claptrap wrote:
> On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:
>> On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:
>>> On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton 
>>> wrote:
>>>> On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
>>>>> [...]
>>>>
>>>> If it provides the same memory ordering guarantees does it 
>>>> matter (if we ignore performance for a second)? There are 
>>>> situations where you do (for reasons beyond performance) 
>>>> actually need a efficient (no overhead) atomic operation in 
>>>> lock-free coding, but these are really on the edge of what 
>>>> can be considered guaranteed by any specification.
>>>
>>> It matters because the whole point of atomic operations is 
>>> lock free coding. There is no oh you might need atomic for 
>>> lock free coding, you literally have to have them. If they 
>>> fall back on a mutex it's not lock free anymore.
>>>
>>> memory ordering is a somewhat orthogonal issue from atomic 
>>> ops.
>>
>> Memory ordering is literally why modern atomic operations 
>> exist. That's why there's a lock prefix on the instruction in 
>> X86 - it doesn't just say "do this in one go" it says "do this 
>> in one go *and* maintain this memory ordering for the other 
>> threads".
>
> No it's not, literally go read the Intel SDM, the lock prefix 
> is for atomic operations, that's it's intent. That it also has 
> extra memory ordering guarantees between cores is simply 
> because it would be useless if it didn't also do that.
>
> The m/s/l/fence instructions are for memory ordering.
>
> If you still dont believe me consider this. x86 has always had 
> strong memory ordering, there's no need to worry about reads to 
> a given location being moved ahead of writes to the same 
> location if you are on a single core. That only goes out of the 
> window when you have a multicore x86. The lock prefix existed 
> before x86 cpus had multiple cores. IE locked instructions are 
> for atomics, the memory ordering guarantees already existed for 
> all reads/writes on single cores. The extra guarantees for 
> memory ordering between cores were added when x86 went 
> multicore, because lock free algorithms would not work if the 
> atomic reads/writes were not also ordered between cores.
>
> To say atmoics are about memory ordering is like saying cars 
> are for parking. Yes you have to park them somewhere, but thats 
> not the problem they are meant to solve.

X86 isn't the only processor, by atomics I am referring to the 
ones we use in programming languages (note that I said modern). 
The ones in D are basically inherited from C++11 (and LLVM), and 
we're drafted because working with memory ordering prior to them 
was wild-west.

X86 is also still ordered with or without a LOCK prefix. It's a 
weaker kind of order but it's still defined (it actually wasn't 
defined to an academically satisfactorily standard until 
shockingly recently). For example, a store with release ordering 
on X86 will yield a regular mov instruction.