Low-Lock Singletons In D

Tue May 7 09:14:46 PDT 2013

On Tue, 07 May 2013 11:30:12 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 5/7/13 10:31 AM, Steven Schveighoffer wrote:
>> On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>
>>> No. A tutorial on memory consistency models would be too long to
>>> insert here. I don't know of a good online resource, does anyone?
>>
>> In essence, a read requires an acquire memory barrier, a write requires
>> a release memory barrier, but in this case, we only need to be concerned
>> if the value we get back is not valid (i.e. NullValue).
>>
>> Once in steady state, there is no need to acquire (as long as the write
>> is atomic, the read value will either be NullValue or ActualValue, not
>> something else).
>
> There's always a need to acquire so as to figure whether the steady  
> state has been entered.

Not really.  Whether it is entered or not is dictated by the vtable.  Even  
classic double-check locking doesn't need an acquire outside the lock.   
Even if your CPU's view of the variable is outdated, the check after the  
memory barrier inside the lock only occurs once.  After that, steady state  
is achieved.  All subsequent reads need no memory barriers, because the  
singleton object will never change after that.

The only thing we need to guard against is non-atomic writes, and out of  
order writes of the static variable (fixed with a memory barrier).   
Instruction ordering OUTSIDE the lock is irrelevant, because if we don't  
get the "steady state" value (not null), then we go into the lock to  
perform the careful initialization with barriers.

I think aligned native word writes are atomic, so we don't have to worry  
about that.

But I think we've spent enough time on this solution.  Yes, double-checked  
locking can be done, but David's pattern is far easier to implement,  
understand, and explain.  It comes at a small cost of checking a boolean  
before each access of the initialized data.  His benchmarks show a very  
small performance penalty.  And another LARGE benefit is you don't have to  
pull out your obscure (possibly challenged) memory model book/blog post or  
the CPU spec to prove it :)

Hmm... you might be able to mitigate the penalty by storing the actual  
object reference instead of a bool in the _instantiated variable.  Then a  
separate load is not required.  David?

-Steve