What exactly shared means?

Fri Jan 2 05:14:03 PST 2015

On Friday, January 02, 2015 11:47:46 Daniel Kozak via Digitalmars-d-learn wrote:
> I always think that shared should be use to make variable global
> across threads (similar to __gshared) with some synchronize
> protection. But this code doesn't work (app is stuck on _aaGetX
> or _aaRehash ):
>
> shared double[size_t] logsA;
>
> void main() {
>
>      auto logs = new double[1_000_000];
>
>      foreach(i, ref elem; parallel(logs, 4)) {
>          elem = log(i + 1.0);
>          logsA[i]= elem;
>      }
> }
>
>
> But when I add synchronized block it is OK:
>
> shared double[size_t] logsA;
>
> void main() {
>
>      auto logs = new double[1_000_000];
>
>      foreach(i, ref elem; parallel(logs, 4)) {
>          elem = log(i + 1.0);
>          synchronized {
>              logsA[i]= elem;
>          }
>      }
> }

Objects in D default to being thread-local. __gshared and shared both make
it so that they're not thread-local. __gshared does it without actually
changing the type, making it easier to use but also dangerous to use,
because it makes it easy to violate the compiler's guarantees, because it'll
treat it like a thread-local variable with regards to optimizations and
whatnot. It's really only meant for use with C global variable declarations,
but plenty of folks end up using it for more, because it avoids having the
compiler complain at them like it does with shared. Regardless, if you use
__gshared, you need to make sure that you protect it against being accessed
by multiple threads at once using mutexes or synchronized blocks or whatnot.

shared does not add any more synchronization or automatic mutex-locking or
anything like that than __gshared does (IIRC, there is some talk in TDPL
about shared adding memory barriers - which __gshared wouldn't do - but that
hasn't been implemented and probably never would be, because it would be too
expensive with regards to efficiency). However, unlike __gshared, shared
_does_ alter the type of the variable, so the compiler will treat it
differently. That way, it won't do stuff like optimize code under the
assumption that the object is thread-local like it can do with non-shared
objects. It makes it clear which objects are thread-local and which aren't
and enforces that with the type system. In principle, this is great, since
it clearly separates thread-local and non-thread local objects and protects
you against treating a non-thread local object as if it were thread-local.
And as long as you're writing code which operates specifically on shared
variables rather than trying to use "normal" code with them, it works great.
The problem is that you inevitably want to do things like use a function
that takes thread-local variables on a shared object - e.g. if a type is
written to be used as thread-local, then none of its member functions are
shared, and none of them can be used by a shared object, which obviously
makes using such a type as shared to be a bit of a pain.

In principle, D is supposed to provide ways to safely convert shared objects
to thread-local ones - i.e. when the compiler can guarantee that the object
is protected by a mutex or synchronized block or whatnot. The main way that
this was proposed is what TDPL describes with regards to synchronized
classes. The member variables of a synchronized class would be shared, and
protected by the class, since all of its member functions would be
synchronized, and no direct access to the member variables would be allowed,
guaranteeing that any time the member variables were accessed, it would be
within a synchronized function, meaning that the compiler could guarantee
that all access to the member variables was protected by a mutex. So, the
compiler would then be able to safely strip away the outermost layer of
shared, allowing you to theoretically use the member variables with normal
functions.

However, that only strips away the outermost layer (since that's all the
compiler could guarantee was protected), which frequently wouldn't be
enough, and it requires creating entire synchronized types just to use
shared objects. So, the efficacy of the idea is questionable IMHO, much as
the motivation is a good one (only removing shared when the compiler can
guarantee that only one thread can access it). However, synchronized
classes have yet to be implemented (only synchronized functions), so we
don't currently have the ability to have the outermost layer of shared be
stripped away like that. There is currently no place in the language where
the compiler is able to guarantee that a shared object is sufficiently
protected against access from multiple threads at once for it to be able to
automatically remove shared under any circumstances.

The _only_ way to strip it away at this point is to cast it away explicitly.
So, right now, what you're forced to do is something like

shared T foo = funcThatReturnsSharedT();

synchronized(someObj)
{
    // be sure at this point that all other code that access foo
    // also synchronizes on someObj before accessing it.

    auto bar = cast(T)foo;

    // do something with bar like call normal member functions or
    // pass call normal free functions on it that don't take shared.

    // be sure at this point that there are no other thread-local
    // references to foo/bar remaining after whatever has been done to it
    // in this synchronized block has been done to it. All references
    // to it outside the synchronized block must be shared.
}

// now, there should only be shared reference so foo.

Obviously, this is error-prone in that it's up to you to make sure that all
accesses to the shared object are protected and that no thread-local
reference to it escapes a synchronized block. Ideally, the compiler would be
able to determine that shared could be stripped from the object within the
synchronized block, but it has no way of knowing that all other references
to it are properly protected as well (unlike it would with synchronized if
they existed classes), so it can't do that. It's up to you to explicitly
protect access to the shared variable and to make sure that no thread-local
references to it escape that protection.

So, ultimately, when using shared, you currently have one of two options:

1. Only ever use a shared object with code that is specifically written for
shared objects (e.g. classes where all of the member variables are shared).
This avoids having to cast away shared, but you still need to use
synchronized blocks or mutexes to protect access to any shared objects, and
it can mean having to duplicate code that works with thread-local objects.

2. Cast away shared within a synchronized block (or when a mutex is locked)
like in the example above.

Ideally, the situation would be better than this, and we're pretty much all
in agreement that we want to improve it, but we have yet to actually come up
with a better solution yet. The unfortunate result is that a lot of folks
just use __gshared rather than write code explicitly for shared objects or
cast away shared within synchronized blocks. But while using shared
"properly" is currently far more annoying than it should be, IMHO it's well
worth the extra protection you get of knowing when objects are shared or
thread-local. It only comes at the cost of having to make sure that all
accesses to the variable are properly protected by a mutex or synchronized
block and having to cast away shared within that area of protection, and
except for the casting, that's exactly what you have to do in languages like
C++ and Java anyway, except that in D, you know exactly what code involves
shared objects, and it's nicely segregated, whereas in C++, it could easily
be anywhere in you code and you wouldn't know it, because it's not part of
the type system at all. So, even if shared is not yet where we want it to
be, it's still a significant improvement over the likes of C++ and Java
IMHO. But hopefully, the situation in D will improve in the future so that
using shared isn't quite as unwieldy.

- Jonathan M Davis