[dmd-concurrency] word tearing status in today's processors

Wed Jan 27 10:46:28 PST 2010

On Wed, Jan 27, 2010 at 12:36 PM, Michel Fortin
<michel.fortin at michelf.com>wrote:

> Le 2010-01-27 à 11:40, Kevin Bealer a écrit :
>
> > Tier A, the shared operations, could include byte and int64 operations,
> but also anything else you'd like to do with a shared object, such as
> modifying an associative array's contents or swapping the contents of two
> adjacent fields of the same object.
>
> Hum, nice idea, but we need to be careful with that. Locking into the lock
> pool for performing a simple data operation is fine. But since you never
> know in what order each thread will take its locks in the lock pool, no
> thread should be allowed to take a lock while it holds a lock in the lock
> pool, even a separate object lock.

What I was thinking of is two non-overlapping lock pools, and the order
would always be that you either take a (shared) lock in the first, then (an
exclusive in) the second, or you only take the (exclusive) lock in the
first.  So all lock ops should be ordered and safe from deadlock as long as
the thread doesn't go belly up while holding the lock.  If you needed to
take two locks in either pool, you'd need to take them together in each pool
in order, and you could take both of the disk-page level locks in numerical
order to avoid an ordering issue there -- I'm thinking of these locks as
only being accessible to this shared-qualifier implementation layer, not
taken from a general pool of locks the user can dip into.  The disk-page
locks are not in the same pool as the 8-byte-range locks.

E.g. update a shared long on a platform that doesn't have 8 byte atomics,
assuming a secondary protection size of 16:

address = 12345;

 get_shared_lock(table1[address / disk_page_size]);
 get_exclusive_lock(table2[address / 16]);

// modify the long in some way
*12345 = *12345 + 10;

clear_exclusive_lock(table2[address / 16]);
 clear_shared_lock(table1[address / disk_page_size]);

I was thinking of the disk page locks as a way to support operations that
cover any operation affecting an entire shared struct no matter how big it
is (up to the disk page size).  If we assume that all ops will be field-wise
then you could just use one lock that is at the granularity you want to lock
at.

This should be deadlock free as long as the set of locks needed for any op
are known before getting any of them, which I think is doable if the
operations each apply to one location.  This scheme wouldn't work if you
wanted to combine arbitrary sets of operations under a single atomicness
constraint -- it only works for simply defined operations, like reading a 16
byte value or storing a single byte into a word.

One nice thing about spinlocks is that they don't actually "block", so you
can avoid deadlocks based on lock order anyway -- if you can't get the last
lock you need after N operations, you could back out the ones you have and
wait for a few milliseconds.  But I was trying to avoid that kind of
complexity.

> > Maybe this should be limited to things that might conceivably be possible
> using atomic CPU operations, such as atomically modifying two adjacent
> fields together (I'm thinking of array assignment but it's probably better
> for layer 1 to be conceived in terms of what phyical operation is being
> done), but maybe not something like sorting
> > an array.
>
> As I explained, it needs to be limited to some specific operation to avoid
> deadlocks.
>
> If you see atomic operations *as an optimization*, then for that
> optimization to be possible you need to restrict the lock pool to only what
> is possible using atomics. That's because you can't mix atomic and lock pool
> operations on the same variable. If only one of the allowed operations can't
> be done with atomics, you need to use the lock pool for all others too.
>
Right.  I was thinking that each operation, defined in terms like "increment
an int32" is either a lock-pool op or an atomic op for a given platform.  If
your platform allows single-byte-write without shearing then writing shared
'byte' fields should be done using that mechanism.  If it doesn't support
8-byte atomic operations, then those fields have to be handled with the
lock pool mechanism in all cases.

This implies that every piece of the data is part of exactly one protection
domain -- e.g. if you have an 8 byte region of memory, you need to either
treat it as a long or as two integers, you can't mix.  It also implies that
if you want byte-level access and don't have it natively, that objects
falling in the same word need to share their locking strategy -- so maybe
every struct would need a map of which fields are locked and which are
atomic.

> Note that I said "as an optimization". If using the lock pool is done
> explicitly, the programer can ensure correct usage:
>
>        shared int x;
>        shared long y;
>
>        ++atomic(x); // x always accessed as atomic
>        ++locked(y); // y always accessed as locked
>
> Although in this case it might be safer if this was part of the type
> information, perhaps through a template:
>
>        shared Atomic!int x;
>        shared Locked!long y;
>
>        ++x; // x always accessed as atomic
>        ++y; // y always accessed as locked
>
> Static if could be used to choose between Atomic!T and Locked!T when
> specific atomic capabilities are missing:
>
>        // check for atomic increment
>        static if (is(Atomic!long()++))
>                Atomic!long y;
>        else
>                Locked!long y;
>
>
> --
> Michel Fortin
> michel.fortin at michelf.com
> http://michelf.com/
>
>
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100127/159b2233/attachment-0001.htm>