[dmd-concurrency] real

Andrei Alexandrescu andrei at erdani.com
Fri Jan 29 08:13:52 PST 2010


Kevin Bealer wrote:
> I like this analysis in principle but the #3 option has a factor - 100x 
> slower - has this really been tested?

Great point. Don't decide til you test. Particularly when the decision 
is so far-reaching.

> I'll grant that a full pthreads 
> style mutexes, which are function calls with a lot of overhead and logic 
> built into it, not to mention system calls in some cases, are pretty 
> darn slow.  But once we assume that atomics require a memory barrier of 
> some kind on read, and also that a simple spinlock is good enough for a 
> mutex, I wonder if it is that large.  Contrast these two designs to 
> implement "shared real x; x = x + 1;"
> 
> No magic:
> 
>    <memory barrier>
>    CAS loop to do x = x + 1
>    <memory barrier>
> 
> Versus emulated:
> 
>    <memory barrier>
>    register int sl_index = int(& x) & 0xFF;
>    CAS loop to set _spinlock_[sl_index] from 0 to 1
>    x = x + 1 // no CAS needed here this time
>    _spinlock_[sl_index] = 0 // no CAS needed to unlock
>    <memory barrier>
> 
> I assume some of these memory barriers are not needed, but is the second 
> design really 100x slower?  I'd think the CAS is the slowest part 
> followed by the memory barrier, and the rest is fairly minor, right?  
> The sl_index calculation should be cheap since &x must be in a register 
> already.

This is an excellent point, Kevin. Could someone on this list write and 
run some test code pronto? Pretty please? With sugar on top?

Also note that shared array assignment is more important than real to 
look at.

Andrei


More information about the dmd-concurrency mailing list