Synchronisation help

Mon Jan 1 19:49:28 UTC 2024

On Monday, January 1, 2024 8:48:16 AM MST Anonymouse via Digitalmars-d-learn 
wrote:
> I have a `shared string[int]` AA that I access from two different
> threads. The function I spawn to start the second thread takes
> the AA as an argument.
>
> ```d
> class Foo
> {
>      shared string[int] bucket;
>      Tid worker;
> }
>
> void workerFn(shared string[int] bucket)
> {
>      while (true)
>      {
>          // occasionally reads, occasionally modifies bucket
>      }
> }
>
> void main()
> {
>      auto foo = new Foo;
>      foo.bucket[0] = string.init;
>      foo.bucket.remove(0);
>      foo.worker = spawn(&workerFn, foo.bucket);
>
>      while (true)
>      {
>          // occasionally reads, occasionally modifies bucket
>      }
> }
> ```
>
> (`run.dlang.io` shortening seems broken again, but I made a
> [gist](https://gist.github.com/zorael/17b042c424cfea5ebb5f1f3120f983f4) of a
> more complete example.)
>
> Reading the specs on `synchronized` statements, it seems I need
> to provide an `Object` to base synchronisation on when two
> *different* places in the code needs synchronising, whereas if
> it's in the same place an expressionless `synchronize { }` will
> do.
>
> The worker function can't see `Foo foo` inside `main`, so it
> can't share synchronisation on that.
>
> What is the common solution here? Do I add a module-level `Object
> thing` and move everything accessing the AA into
> `synchronized(.thing)` statements? Or maybe add a `shared static`
> something to `Foo` and synchronise with `synchronize(Foo.thing)`?

In general, I would advise against using synchronized statements. They
really don't add anything, particularly since in many cases, you need access
to more complex thread-synchronization facilities anyway (e.g. condition
variables). Really, synchronized statements are just a Java-ism that D got
fairly early on that were arguably a mistake. So, I'd typically suggest that
folks just use Mutex from core.sync.mutex directly (though you can certainly
use them if you don't need to do anything more complex).

https://dlang.org/phobos/core_sync_mutex.html

If you're using synchronized statements, you're essentially just using
syntax which does that underneath the hood without providing you the
functionality to use stuff like Condition from core.sync.condition.

https://dlang.org/phobos/core_sync_condition.html

However, regardless of whether you use synchronized or use Mutex directly,
what you need to do is to have an object that functions as a mutex to
protect the shared data, locking it whenever you access it so that only one
thread can access it at a time.

The best place to put that mutex varies depending on what your code is
doing. A shared static variable could make sense, but it's often the case
that you would put the mutex inside the class or struct that contains the
data that's shared across threads. Or if you don't have a type that's
intended to encompass what you're doing with the shared data, then it often
makes sense to create one to hold the shared data so that the code that's
using it doesn't have to deal with the synchronization mechanisms but rather
all of that mess is contained entirely within the class or struct that
you're passing around. But even if you don't want to encapsulate it all
within a struct or class, simply creating one to hold both the shared data
and the mutex makes it so that they'll be together wherever you're passing
them around, making it easy for the code using the AA to access the mutex.

However, because you're not supposed to actually be mutating data while it's
shared (and the compiler largely prevents you from doing so), what you
generally need to do to operate on shared data is to lock the mutex that
protects it, cast away shared so that you can operate on the data, do
whatever it is that you need to do with the now thread-local data, make sure
that no thread-local references to the data exist any longer, and then lock
the mutex again. And to do that cleanly, it's often nice to create a struct
or class with shared member functions which takes care of all of that for
you so that that particular dance is encapsulated rather than having to deal
with any code that has access to that shared data having to deal with the
synchronization correctly.

Given that you already have a class called Foo which contains the AA, I
would say that the most obvious thing to do would be to just pass a shared
Foo across threads rather than pass the AA from inside Foo. Then you can
either put a mutex in Foo that then naturally gets passed along with the AA,
or you just use the class itself as a mutex - e.g. synchronized(this) {}
IIRC - since classes unfortunately have a mutex built into them to make
synchronized member functions work (which is useful when you want to use
synchronized functions but causes unnecessary bloat for most classes). And
if it makes sense to lock the mutex during entire function calls, you can
just make the member function synchronized rather than having synchronized
statements (though it often doesn't make sense to have locks cover that much
code at once).

Personally, I'd probably just make Foo a struct and pass a shared Foo* to
the other thread rather than bother with a class, since it doesn't look like
you're trying to do anything with inheritance, and in that case, you'd give
it an explicit mutex, since structs don't have mutexes built in. But if you
want to use a class and use its built-in mutex with synchronized, that works
too. Either way, it makes more sense to pass a shared object containing both
the AA and the mutex around than to pass the AA around by itself given that
the whole point of the mutex is to protect that AA.

- Jonathan M Davis