Atomic Ref Counting

Sun Nov 21 18:03:52 PST 2010

On Sunday 21 November 2010 17:59:35 dsimcha wrote:
> This message originally appeared on the Phobos list, but I've grown
> impatient waiting for responses and decided it needs to see a wider
> audience:
> 
> Now that I have time, I did some benchmarks of atomic ref counting.  It is
> somewhat expensive, but not obscenely so.  Here's my benchmark:
> 
> import std.stdio, std.datetime, std.container;
> 
> void main() {
>     Array!uint arr;
>     arr.length = 4;
> 
>     auto sw = StopWatch(autoStart);
>     foreach(i; 0..10_000_000) {
>         auto arr2 = arr;
>     }
> 
>     writeln(sw.peek.milliseconds);
> }
> 
> Here's a diff of std.typecons:
> 
> Index: typecons.d
> ===================================================================
> --- typecons.d    (revision 2181)
> +++ typecons.d    (working copy)
> @@ -2328,7 +2328,16 @@
>      this(this)
>      {
>          if (!RefCounted.isInitialized) return;
> -        ++RefCounted._store._count;
> +        //++RefCounted._store._count;
> +        auto p = &(RefCounted._store._count);
> +        asm {
> +            push EAX;
> +            mov EAX, p;
> +            lock;
> +            inc dword ptr[EAX];
> +            pop EAX;
> +        }
> +
>          debug(RefCounted) if (RefCounted.debugging)
>                   writeln(typeof(this).stringof,
>                  "@", cast(void*) RefCounted._store, ": bumped refcount to
> ", @@ -2345,7 +2354,23 @@
>      {
>          if (!RefCounted._store) return;
>          assert(RefCounted._store._count > 0);
> -        if (--RefCounted._store._count)
> +
> +        // Can't just do lock; dec; here because I need the value
> afterwords. +        size_t oldVal = void;
> +        auto p = &(RefCounted._store._count);
> +        asm {
> +            push EAX;
> +            push EBX;
> +            mov EAX, size_t.max;  // + size_t.max is equivalent to - 1.
> +            mov EBX, p;
> +            lock;
> +            xadd [EBX], EAX;
> +            mov oldVal, EAX;
> +            pop EBX;
> +            pop EAX;
> +        }
> +
> +        if (oldVal > 1)
>          {
>              debug(RefCounted) if (RefCounted.debugging)
>                       writeln(typeof(this).stringof,
> 
> Note the direct use of ASM.  Between function call overhead and the fact
> that core.atomic uses a CAS loop, using it instead would have about
> doubled the overhead.  I'm not sure how we'll factor this when/if this
> hits production code.
> 
> Anyhow, the timings are:
> 
> Atomic:  258 milliseconds total (258 nanoseconds per iteration)
> Non-atomic:  88 milliseconds  (88 nanoseconds per iteration)
> 
> I think this is a small enough overhead that we should just make all
> reference counting atomic.  This would make the race condition pointed out
> by Michael Fortin completely moot IIUC.  It would also make it possible to
> safely share reference counted containers across threads in
> shared/synchronized wrappers, etc.
> 
> Furthermore, the cost of atomic reference counting can be mitigated by the
> standard tricks for dealing with expensive to copy objects, though it
> shouldn't be necessary except in the most extreme conditions.

I'm all for atomic ref counting, personally. It simplifies all kinds of stuff. And 
as long as the overhead is small enough, I don't really see a downside.

- Jonathan M Davis