synchronized class bugs?

Wed Apr 8 22:50:07 UTC 2020

On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer 
wrote:
> This is why we are moving to REQUIRING casting away shared to 
> read/write data -- you have to do it anyway.
>

OK, I understand the rationale, even though I don't like the 
solution very much. You shouldn't really use atomic ops when 
locks on the affected variables are held. That's bad practice.

>> Bug number 2: even when casting away shared, the end result is 
>> wrong when using multiple threads. Thus, the monitor locking 
>> code in druntime cannot be correct. It does have an effect, 
>> though. Omitting synchronized on Shared results in values that 
>> are wide off the mark. Including it results in small 
>> differences. This suggests that a memory fence might be 
>> missing somewhere.
>> 
>> The observed reality contradicts the documentation in many 
>> ways and is inconsistent with itself. What is going on here?
>
> synchronized classes call _d_monitorenter and _d_monitorexit, 
> which lock OS primitives (and initialize the monitor if 
> necessary). These have not been touched in years, and depend on 
> the OS mutex which should be rock-solid. Looking at the 
> history, it looks like the last significant change was 
> https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1b06db16e214b30db6f74 in 2015.
>

The calls to _d_monitorenter and _d_monitorexit are in the 
generated code and I've looked up their implementation before 
posting to make sure I have correct understanding.

I would also assume mutexes to be rock solid. This is why I am in 
utter disbelief about the actual behavior of the toy program (see 
code at the end).

> I've definitely depended on synchronized functions in the past, 
> and haven't had issues, but it was in the far past.
>
> The only place I can reasonably assume that there might be an 
> issue is for the initialization.
>
> To test this, try initializing the lock FIRST before spawning 
> multiple threads (just call one of the functions), and see if 
> it helps. If so, it might be a bug in the initialization 
> routines.
>

I added such function calls and that didn't change anything.

> Also if you find an issue, it may be specific to your OS.
>

Well, as I've mentioned before, I've reproduced this on Linux and 
Windows with dmd, ldc and gdc. So it's not OS specific, or at 
least not entirely.

> -Steve

The the full program code is below the line. You can run it on 
run.dlang.org to reproduce the problem.

----

import core.thread;

import std.stdio;

version = Sync;

version(Sync) {
     pragma(msg, "synchronized");

     synchronized class Shared {
     public:
         void increment(int value) {
             (cast(Shared) this).x += value;
         }

         void decrement(int value) {
             (cast(Shared) this).x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }
} else {
     pragma(msg, "non-synchronized");

     class Shared {
     public:
         shared void increment(int value) {
             (cast(Shared) this).x += value;
         }

         shared void decrement(int value) {
             (cast(Shared) this).x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }
}

void main()
{
     shared Shared s = new shared Shared();

     s.increment(1);
     s.decrement(1);
     writeln(s.get());

     Thread[] threads = new Thread[10];
     for(int i = 0; i < threads.length; i++) {
         threads[i] = new Thread({
             for(int j = 0; j < 1_000_000; j++) {
                 s.increment(i);
             }
             for(int j = 0; j < 1_000_000; j++) {
                 s.decrement(i);
             }
         });
         threads[i].start();
     }

     for(int i = 0; i < threads.length; i++) {
         threads[i].join();
     }

     writeln(s.get());
}