synchronized class bugs?

Wed Apr 8 13:43:08 UTC 2020

On 4/7/20 11:18 AM, Gregor Mückl wrote:
> I've been playing around with synchronized class. My example is the 
> following dummy class:
> 
>      synchronized class Shared {
>      public:
>          void increment(int value) {
>              //(cast(Shared) this).x += value;
>              x += value;
>          }
> 
>          void decrement(int value) {
>              //(cast(Shared) this).x -= value;
>              x -= value;
>          }
> 
>          shared int get() { return x; }
> 
>      private:
>          int x;
>      }
> 
> Then I'm calling increment and decrement many times in parallel threads 
> so that they *should* cancel out in the end and x should be back to 0 at 
> the end. With the implicit synchronization using d monitors, this should 
> work.
> 
> Bug number 1: dmd doesn't translate this without casting away shared:
> 
> sharedtest.d(14): Error: read-modify-write operations are not allowed 
> for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.
> 
> The += and -= operators are safe as they are inside locked monitors. The 
> emitted code contains calls to d_montiorenter and d_monitorexit. The 
> compiler should understand this.

The requirement for atomicOp is separate from the synchronization. The 
synchronization is only sound because you made it sound.

In fact, if you did:

auto getPtr() { return &x; }

Then the synchronization all of a sudden does not apply to that variable.

This is why we are moving to REQUIRING casting away shared to read/write 
data -- you have to do it anyway.

> Bug number 2: even when casting away shared, the end result is wrong 
> when using multiple threads. Thus, the monitor locking code in druntime 
> cannot be correct. It does have an effect, though. Omitting synchronized 
> on Shared results in values that are wide off the mark. Including it 
> results in small differences. This suggests that a memory fence might be 
> missing somewhere.
> 
> The observed reality contradicts the documentation in many ways and is 
> inconsistent with itself. What is going on here?

synchronized classes call _d_monitorenter and _d_monitorexit, which lock 
OS primitives (and initialize the monitor if necessary). These have not 
been touched in years, and depend on the OS mutex which should be 
rock-solid. Looking at the history, it looks like the last significant 
change was 
https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1b06db16e214b30db6f74 
in 2015.

I've definitely depended on synchronized functions in the past, and 
haven't had issues, but it was in the far past.

The only place I can reasonably assume that there might be an issue is 
for the initialization.

To test this, try initializing the lock FIRST before spawning multiple 
threads (just call one of the functions), and see if it helps. If so, it 
might be a bug in the initialization routines.

Also if you find an issue, it may be specific to your OS.

-Steve