D 2.0 FAQ on `shared`

Tue Oct 21 06:20:39 PDT 2014

Am Mon, 20 Oct 2014 16:18:51 +0000
schrieb "Sean Kelly" <sean at invisibleduck.org>:

> On Monday, 20 October 2014 at 13:29:47 UTC, Marco Leise wrote:
> >
> > What if I have a thread that contains some shared data? Should
> > the thread be created as shared, be cast to shared after
> > construction or not be shared and fine grained shared applied
> > to the respective shared data fields?
> 
> Since Thread is by its very nature a shared thing, Thread should 
> probably be defined as shared.  But more generally it depends on 
> the use case.

In a single-threaded application in particular, there is an
unshared thread :p
But to the point: Doesn't defining it as shared means that it
can not have _any_ unshared methods? Ok, fair enough. So even
if a method is only working on technically unshared parts of
the thread's data, it has to cast everything to unshared
itself. This makes sense since `this`, the Thread itself is
still shared.

> […]
> > If I have a Mutex to protect my unit of shared data, I
> > don't need "volatile" handling of shared data.
> >
> >     private shared class SomeThread : Thread
> >     {
> >     private:
> >
> >         Condition m_condition;
> >         bool m_shutdown = false;
> >         ...
> >     }
> 
> Yep.  This is one of my biggest issues with shared as it applies 
> to user-defined types.  I even raised it in the now defunct 
> concurrency mailing list before the design was finalized.  Sadly, 
> there's no good way to sort this out, because:
> 
> shared class A {
>      int m_count = 0;
>      void increment() shared {
>          m_count.atomicOp!"+="(1);
>      }
> 
>      int getCount() synchronized {
>          return m_count;
>      }
> }
> 
> If we make accesses of shared variables non-atomic inside 
> synchronized methods, there may be conflicts with their use in 
> shared methods.  Also:

Well, when you talk about "shared and unshared operations"
further down, I took it as the set of operations ensuring
thread-safety over a particular set of shared data. That code
above is just a broken set of such operations. I.e. in this
case the programmer must decide between mutex synchronization
and atomic read-modify-write. That's not too much to ask.

> shared class A {
>      void doSomething() synchronized {
>          doSomethingElse();
>      }
> 
>      private void doSomethingElse() synchronized {
> 
>      }
> }
> 
> doSomethingElse must be synchronized even if I as a programmer 
> know it doesn't have to be because the compiler insists it must 
> be.  And I know that private methods are visible within the 
> module, but the same rule applies.  In essence, we can't avoid 
> recursive mutexes for implementing synchronized, and we're stuck 
> with a lot of recursive locks and unlocks no matter what, as soon 
> as we slap a "shared" label on something.

Imagine you have a shared root object that contains a deeply
nested private data structure that is technically unshared.
Then it becomes not only one more method of the root object
that needs to be `synchronized` but it cascades all the way
down its private fields as well. One ends up requiring data
structures designed for single-threaded execution to
grow synchronized methods over night even though they aren't
_really_ used concurrently my multiple threads.

> >> What are the semantics of casting FROM shared TO
> >> unshared?
> >>
> >> Make sure there are no other shared references to that same
> >> data.
> >
> > That's just wrong to ask. `SomeThread` is a worker thread and
> > data is passed to it regularly through a shared reference that
> > is certainly never going away until the thread dies.
> > Yet I must be able to "unshare" it's list of work items to
> > process them.
> 
> Sure, but at that point they are no longer referenced by the 
> shared Thread, correct?

The work items? They stay referenced by the shared Thread
until it is done with them. In this particular implementation
an item is moved from the list to a separate field that
denotes the current item and then the Mutex is released.
This current item is technically unshared now, because only
this thread can really see it, but as far as the language is
concerned there is a shared reference to it because shared
applies transitively.
The same goes for the list of items while it is under the
Mutex protection.

> The rule is simply that you can't be 
> trying to read or write data using both shared and unshared 
> operations, because of that reader-writer contract I mentioned 
> above.

Something along that line yes. The exact formulation may need
to be ironed out, but what the FAQ says right now doesn't
work. When you say (un)shared operations that maps to any means
of ensuring thread-safe operations on a set of data.
It can range from putting synchronized in front of your class
to the exact order of executing a series of loads and stored
in a lock-free algorithm.
Anything between a single shared basic data type and full
blown synchronized class is too complicated for the compiler
to see through. So a simple definition of `shared` like the
FAQ attempts wont fly. Most methods are "somewhat shared":

private void workOnAnItem() shared
{
	// m_current is technically never shared,
	// but we cannot describe this with `shared`.
	// Hence I manually unshare where appropriate.

	synchronized (m_condition.unshared.mutex)
	{
		m_current = m_list.unshared.front;
		m_list.unshared.removeFront();
	}
	m_current.unshared.doSomthing();
}

> > […]
> >
> > So the text should read:
> >
> >> What are the semantics of casting FROM shared TO unshared?
> >>
> >> Make sure that during the period the data is unshared, no
> >> other thread can modify those parts of it that you will be
> >> accessing. If you don't use synchronization objects with
> >> built-in memory-barriers like a Mutex, it is your
> >> responsibility to properly synchronize data access through
> >> e.g. atomicLoad/Store.
> >
> > That at least in general sanctifies casting away shared for
> > the purpose of calling a method under protection of a user
> > defined critical section.
> 
> It's more complicated than that, because you don't know how long 
> a given operation needs to propagate to another CPU.  Simply 
> performing a shared write is meaningless if something else is 
> performing an unshared read because the optimization happens at 
> both points--the write side and the read side.

You are right, my point was that the original formulation is
so strict that can only come from the point of view of using
shared in message passing. It doesn't spend a thought on how a
shared(Thread) is supposed to be both referable and able to
unshare internal lists during processing.

> In essence, the CPU performs the same optimizations as a 
> compiler. […]

Yeah, I know, except for the DEC Alpha part.

> Basically what's needed is some way to have the compiler optimize 
> according to the same rules as the CPU (the goal of "shared").  
> Or in lieu of that, to have some "don't optimize this" 
> instruction to tell the compiler to keep it's dirty hands off 
> your carefully constructed code so the only thing you need to 
> worry about is what the CPU is trying to do.  This is what 
> "volatile" was meant for in D1 and I really liked it, but I think 
> I was the only one.

Count me in. Anecdotally I once tried to see if I can write a
minimal typed malloc() that is faster than temalloc. It went
way over budget from a single CAS instruction, where temalloc
mostly works on thread local pools.
These synchronizations stall the CPU _that_ much, that I don't
see how someone writing lock-free algorithms with `shared`
will accept implicit full barriers placed by the language.
This is a dead-end to me.
Mostly what I use is load-acquire and store-release, but
sometimes raw atomic read access is sufficient as well.

So ideally I would like to see:

volatile -> compiler doesn't reorder stuff

and on top of that:

atomicLoad/Store -> CPU doesn't reorder stuff in the pipeline
                    in the way I described by MemoryOrder.xxx

A shared variable need not be volatile, but a volatile
variable is implicitly shared.

> There's a paper on release consistency that I think is fantastic. 
>   I'll link it later if I can find it on the interweb.  CPUs seem 
> to be converging on even more strict memory ordering than release 
> consistency, but the release consistency model is really 
> fantastic as it's basically equivalent to how mutexes work and so 
> it's a model everyone already understands.

-- 
Marco