D 2.0 FAQ on `shared`

Tue Oct 21 09:05:57 PDT 2014

On Tuesday, 21 October 2014 at 13:10:57 UTC, Marco Leise wrote:
> Am Mon, 20 Oct 2014 16:18:51 +0000
> schrieb "Sean Kelly" <sean at invisibleduck.org>:
>
> But to the point: Doesn't defining it as shared means that it
> can not have _any_ unshared methods? Ok, fair enough. So even
> if a method is only working on technically unshared parts of
> the thread's data, it has to cast everything to unshared
> itself. This makes sense since `this`, the Thread itself is
> still shared.

Good point about a shared class not having any unshared methods.  
I guess that almost entirely eliminates the cases where I might 
define a class as shared.  For example, the MessageBox class in 
std.concurrency has one or two ostensibly shared methods and the 
rest are unshared.  And it's expected for there to be both shared 
and unshared references to the object held simultaneously.  This 
is by design, and the implementation would either be horribly 
slow or straight-up broken if done another way.

Also, of the shared methods that exist, there are synchronized 
blocks but they occur at a fine grain within the shared methods 
rather than the entire method being shared.  I think that 
labeling entire methods as synchronized is an inherently flawed 
concept, as it contradicts the way mutexes are supposed to be 
used (which is to hold the lock for as short a time as possible). 
  I hate to say it, but if I were to apply shared/synchronized 
labels to class methods it would simply be to service user 
requests rather than because I think it would actually make the 
code better or safer.

>> shared class A {
>>      int m_count = 0;
>>      void increment() shared {
>>          m_count.atomicOp!"+="(1);
>>      }
>> 
>>      int getCount() synchronized {
>>          return m_count;
>>      }
>> }
>> 
>> If we make accesses of shared variables non-atomic inside 
>> synchronized methods, there may be conflicts with their use in 
>> shared methods.  Also:
>
> Well, when you talk about "shared and unshared operations"
> further down, I took it as the set of operations ensuring
> thread-safety over a particular set of shared data. That code
> above is just a broken set of such operations. I.e. in this
> case the programmer must decide between mutex synchronization
> and atomic read-modify-write. That's not too much to ask.

I agree.  I was being pedantic for the sake of informing anyone 
who wasn't aware.  There are times where I have some fields be 
lock-free and others protected by a mutex though.  See 
Thread.isRunning, for example.  There are times where a write 
delay is acceptable and the possibility of tearing is irrelevant. 
  But I think this falls pretty squarely into the "expert" 
category--I don't care if the language makes it easy.

>> shared class A {
>>      void doSomething() synchronized {
>>          doSomethingElse();
>>      }
>> 
>>      private void doSomethingElse() synchronized {
>> 
>>      }
>> }
>> 
>> doSomethingElse must be synchronized even if I as a programmer 
>> know it doesn't have to be because the compiler insists it 
>> must be.  And I know that private methods are visible within 
>> the module, but the same rule applies.  In essence, we can't 
>> avoid recursive mutexes for implementing synchronized, and 
>> we're stuck with a lot of recursive locks and unlocks no 
>> matter what, as soon as we slap a "shared" label on something.
>
> Imagine you have a shared root object that contains a deeply
> nested private data structure that is technically unshared.
> Then it becomes not only one more method of the root object
> that needs to be `synchronized` but it cascades all the way
> down its private fields as well. One ends up requiring data
> structures designed for single-threaded execution to
> grow synchronized methods over night even though they aren't
> _really_ used concurrently my multiple threads.

I need to give it some more thought, but I think the way this 
should work is for shared to not be transitive, but for the 
compiler to require that non-local variables accessed within a 
shared method must either be declared as shared or the access 
must occur within a synchronized block.  This does trust the 
programmer a bit more than the current design, but in exchange it 
encourages a programming model that actually makes sense.  It 
doesn't account for the case where I'm calling pthread_mutex_lock 
on an unshared variable though.  Still not sure about that one.

>> Sure, but at that point they are no longer referenced by the 
>> shared Thread, correct?
>
> The work items? They stay referenced by the shared Thread
> until it is done with them. In this particular implementation
> an item is moved from the list to a separate field that
> denotes the current item and then the Mutex is released.
> This current item is technically unshared now, because only
> this thread can really see it, but as far as the language is
> concerned there is a shared reference to it because shared
> applies transitively.

Oh I see what you're getting at.  This sort of thing is why 
Thread can be initialized with an unshared delegate.  Since 
join() is an implicit synchronization point, it's completely 
normal to launch a thread that modifies local data, then call 
join and expect the local data to be in a coherent state.  Work 
queues are much the same.

> The same goes for the list of items while it is under the
> Mutex protection.
>
>> The rule is simply that you can't be trying to read or write 
>> data using both shared and unshared operations, because of 
>> that reader-writer contract I mentioned above.
>
> Something along that line yes. The exact formulation may need
> to be ironed out, but what the FAQ says right now doesn't
> work.

Yes.  See my suggestion about shared being non-transitive above.  
I think that's at least in the right ballpark.

> Anything between a single shared basic data type and full
> blown synchronized class is too complicated for the compiler
> to see through. So a simple definition of `shared` like the
> FAQ attempts wont fly. Most methods are "somewhat shared":
>
> private void workOnAnItem() shared
> {
> 	// m_current is technically never shared,
> 	// but we cannot describe this with `shared`.
> 	// Hence I manually unshare where appropriate.
>
> 	synchronized (m_condition.unshared.mutex)
> 	{
> 		m_current = m_list.unshared.front;
> 		m_list.unshared.removeFront();
> 	}
> 	m_current.unshared.doSomthing();
> }

As they should be.  This is the correct way to use mutexes.

> You are right, my point was that the original formulation is
> so strict that can only come from the point of view of using
> shared in message passing. It doesn't spend a thought on how a
> shared(Thread) is supposed to be both referable and able to
> unshare internal lists during processing.

And since message passing is encapsulated in an API, we really 
don't need the type system to do anything special.  We can just 
make correct use an artifact of the API itself.

> Count me in. Anecdotally I once tried to see if I can write a
> minimal typed malloc() that is faster than temalloc. It went
> way over budget from a single CAS instruction, where temalloc
> mostly works on thread local pools.
> These synchronizations stall the CPU _that_ much, that I don't
> see how someone writing lock-free algorithms with `shared`
> will accept implicit full barriers placed by the language.
> This is a dead-end to me.

Last time I checked boost::shared_ptr (which admittedly was years 
ago) they'd dropped atomic operations in favor of spins on 
non-atomics.  LOCK has gotten a lot more efficient--I think it's 
on the order of ~75 cycles these days.  And the FENCE operations 
are supposed to work for normal programming now, though it's hard 
to find out whether this is true of all CPUs or only those from 
Intel.  But yes, truly atomic ops are terribly expensive.

> Mostly what I use is load-acquire and store-release, but
> sometimes raw atomic read access is sufficient as well.
>
> So ideally I would like to see:
>
> volatile -> compiler doesn't reorder stuff

Me too.  For example, GCC can optimize around inline assembler.  
I used to have the inline asm code in core.atomic labeled as 
volatile for this reason, but was forced to remove it because 
it's deprecated in D2.

> and on top of that:
>
> atomicLoad/Store -> CPU doesn't reorder stuff in the pipeline
>                     in the way I described by MemoryOrder.xxx
>
> A shared variable need not be volatile, but a volatile
> variable is implicitly shared.

I'm not quite following you here.  Above, I thought you meant the 
volatile statement.  Are you saying we would have both shared and 
volatile as attributes for variables?