Something needs to happen with shared, and soon.

Fri Nov 16 10:56:28 PST 2012

11/16/2012 5:17 PM, Michel Fortin пишет:
> On 2012-11-15 16:08:35 +0000, Dmitry Olshansky <dmitry.olsh at gmail.com>
> said:
>> While the rest of proposal was more or less fine. I don't get why we
>> need escape control of mutex at all - in any case it just opens a
>> possibility to shout yourself in the foot.
>
> In case you want to protect two variables (or more) with the same mutex.
> For instance:
>
>      Mutex m;
>      synchronized(m) int next_id;
>      synchronized(m) Object[int] objects_by_id;
>

Wrap in a struct and it would be even much clearer and safer.
struct ObjectRepository {
	int next_id;
	Object[int] objects_by_id;
}
//or whatever that combination indicates anyway
synchronized ObjectRepository objeRepo;

>      int addObject(Object o)
>      {
>          synchronized(next_id, objects_by_id)

...synchronized(objRepo) with(objRepo)...
Though I'd rather use it as struct directly.

>              return objects_by_id[next_id++] = o;
>      }
>
> Here it doesn't make sense and is less efficient to have two mutexes,
> since every time you need to lock on next_id you'll also want to lock on
> objects_by_id.
>

Yes. But we shouldn't close our eyes on the rest of language for how to 
implement this. Moreover it makes more sense to pack related stuff (that 
is under a single lock) into a separate entity.

> I'm not sure how you could shoot yourself in the foot with this. You
> might get worse performance if you reuse the same mutex for too many
> things, just like you might get better performance if you use it wisely.
>

Easily - now the mutex is separate and there is no guarantee that it 
won't get used for something else then intended. The declaration implies 
the connection but I do not see anything preventing it from abuse.

>
>> But anyway we can make it in the library right about now.
>>
>> synchronized T ---> Synchronized!T
>> synchronized(i){ ... } --->
>>
>> i.access((x){
>> //will lock & cast away shared T inside of it
>>     ...
>> });
>>
>> I fail to see what it doesn't solve (aside of syntactic sugar).
>
> It solves the problem too. But it's significantly more inconvenient to
> use. Here's my example above redone using Syncrhonized!T:
>
>      Synchronized!(Tuple!(int, Object[int])) objects_by_id;
>
>      int addObject(Object o)
>      {
>          int id;
>          objects_by_id.access((obj_by_id){
>              id = obj_by_id[1][obj_by_id[0]++] = o;
>          };
>          return id;
>      }
>
> I'm not sure if I have to explain why I prefer the first one or not, to
> me it's pretty obvious.

If we made a tiny change in the language that would allow different 
syntax for passing delegates mine would shine. Such a change at the same 
time enables more nice way to abstract away control flow.

Imagine:

access(object_by_id){
	...	
};

to be convertible to:

(x){with(x){
	...
}})(access(object_by_id));

More generally speaking a lowering:

expression { ... }
-->
(x){with(x){ ... }}(expression);

AFIAK it doesn't conflict with anything.

Or wait a sec. Even simpler idiom and no extra features.
Drop the idea of 'access' taking a delegate. The other library idiom is 
to return a RAII proxy that locks/unlocks an object on construction/destroy.

with(lock(object_by_id))
{
	... do what you like
}

Fine by me. And C++ can't do it ;)

>> The key point is that Synchronized!T is otherwise an opaque type.
>> We could pack a few other simple primitives like 'load', 'store' etc.
>> All of them will go through lock-unlock.
>
> Our proposals are pretty much identical. Your works by wrapping a
> variable in a struct template, mine is done with a policy object/struct
> associated with a variable. They'll produce the same code and impose the
> same restrictions.

I kind of wanted to point out this disturbing thought about your 
proposal. That is a lot of extra syntax and rules added buys us very 
small gain - prettier syntax.

>
>> Even escaping a reference can be solved by passing inside of 'access'
>> a proxy of T. It could even asserts that the lock is in indeed locked.
>
> Only if you can make a proxy object that cannot leak a reference. It's
> already not obvious how to not leak the top-level reference, but we must
> also consider the case where you're protecting a data structure with the
> mutex and get a pointer to one of its part, like if you slice a container.
>
> This is a hard problem. The language doesn't have a solution to that
> yet. However, having the link between the access policy and the variable
> known by the compiler makes it easier patch the hole later.
>

It need not be 100% malicious dambass proof. Basic foolproofness is OK.
See my sketch, it could be vastly improved:
https://gist.github.com/4089706

See also Ludwig's work. Though he is focused on classes and their 
monitor mutex.

> What bothers me currently is that because we want to patch all the holes
> while not having all the necessary tools in the language to avoid
> escaping references, we just make using mutexes and things alike
> impossible without casts at every corner, which makes things even more
> bug prone than being able to escape references in the first place.
Well it kind of double-edged.

However I do think we need more general tools in the language and niche 
ones in the library. Precisely because you can pack tons of niche and 
miscellaneous stuff on the bookshelf ;)

Locks & the works are niche stuff enabling a lot more of common things.

> There are many perils in concurrency, and the compiler cannot protect
> you from them all. It is of the uttermost importance that code dealing
> with mutexes be both readable and clear about what it is doing. Casts in
> this context are an obfuscator.
>

See below about high-level primitives. The code dealing with mutexes has 
to be small and isolated anyway. Encouraging pattern of 'just grab the 
lock and you are golden' is even worse (cause it won't break as fast and 
hard as e.g. naive atomics will).

>> That and clarifying explicitly what guarantees (aside from being
>> well.. being shared) it provides w.r.t. memory model.
>>
>> Until reaching this thread I was under impression that shared means:
>> - globally visible
>> - atomic operations for stuff that fits in one word
>> - sequentially consistent guarantee
>> - any other forms of access are disallowed except via casts
>
> Built-in shared(T) atomicity (sequential consistency) is a subject of
> debate in this thread. It is not clear to me what will be the
> conclusion, but the way I see things atomicity is just one of the many
> policies you may want to use for keeping consistency when sharing data
> between threads.
>
> I'm not trilled by the idea of making everything atomic by default.
> That'll lure users to the bug-prone expert-only path while relegating
> the more generally applicable protection systems (mutexes) as a
> second-class citizen.

That's why I think people shouldn't have to use mutexes at all.
Explicitly - provide folks with blocking queues, Synchronized!T, 
concurrent containers  (e.g. hash map) and what not. Even Java has some 
useful incarnations of these.

> I think it's better that you just can't do
> anything with shared, or that shared simply disappear, and that those
> variables that must be shared be accessible only through some kind of
> access policy. Atomic access should be one of those access policies, on
> an equal footing with other ones.

This is where casts will be a most unwelcome obfuscator and there is no 
sensible way to de-obscure it by using higher level primitives. Having 
to say Atomic!X is workable though.

>
> But if D2 is still "frozen" -- as it was meant to be when TDPL got out
> -- and only minor changes can be made to it now, I don't see much hope
> for its concurrency model. Your Syncronized!T and Atomic!T wrappers
> might be the best thing we can hope for, but they're nothing to set D
> apart from its rivals (I could implement that easily in C++ for instance).

Yeah, but we may tweak some syntax in terms of one lowering or a couple. 
I'm of strong opinion that lock-based multi-threading needs no 
_specific_ built-in support in the language.

The case is niche and hardly useful outside of certain help with doing 
safe high-level primitives in the library. As for client code it doesn't 
care that much.
Compared to C++ there is one big thing. That is no-shared by default. 
This alone should be immensely helpful especially when dealing with 3rd 
party libraries that 'try hard to be thread-safe' except that they are 
usually not.

-- 
Dmitry Olshansky