Atomic Ref Counting
Jonathan M Davis
jmdavisProg at gmx.com
Sun Nov 21 18:03:52 PST 2010
On Sunday 21 November 2010 17:59:35 dsimcha wrote:
> This message originally appeared on the Phobos list, but I've grown
> impatient waiting for responses and decided it needs to see a wider
> audience:
>
> Now that I have time, I did some benchmarks of atomic ref counting. It is
> somewhat expensive, but not obscenely so. Here's my benchmark:
>
> import std.stdio, std.datetime, std.container;
>
> void main() {
> Array!uint arr;
> arr.length = 4;
>
> auto sw = StopWatch(autoStart);
> foreach(i; 0..10_000_000) {
> auto arr2 = arr;
> }
>
> writeln(sw.peek.milliseconds);
> }
>
> Here's a diff of std.typecons:
>
> Index: typecons.d
> ===================================================================
> --- typecons.d (revision 2181)
> +++ typecons.d (working copy)
> @@ -2328,7 +2328,16 @@
> this(this)
> {
> if (!RefCounted.isInitialized) return;
> - ++RefCounted._store._count;
> + //++RefCounted._store._count;
> + auto p = &(RefCounted._store._count);
> + asm {
> + push EAX;
> + mov EAX, p;
> + lock;
> + inc dword ptr[EAX];
> + pop EAX;
> + }
> +
> debug(RefCounted) if (RefCounted.debugging)
> writeln(typeof(this).stringof,
> "@", cast(void*) RefCounted._store, ": bumped refcount to
> ", @@ -2345,7 +2354,23 @@
> {
> if (!RefCounted._store) return;
> assert(RefCounted._store._count > 0);
> - if (--RefCounted._store._count)
> +
> + // Can't just do lock; dec; here because I need the value
> afterwords. + size_t oldVal = void;
> + auto p = &(RefCounted._store._count);
> + asm {
> + push EAX;
> + push EBX;
> + mov EAX, size_t.max; // + size_t.max is equivalent to - 1.
> + mov EBX, p;
> + lock;
> + xadd [EBX], EAX;
> + mov oldVal, EAX;
> + pop EBX;
> + pop EAX;
> + }
> +
> + if (oldVal > 1)
> {
> debug(RefCounted) if (RefCounted.debugging)
> writeln(typeof(this).stringof,
>
> Note the direct use of ASM. Between function call overhead and the fact
> that core.atomic uses a CAS loop, using it instead would have about
> doubled the overhead. I'm not sure how we'll factor this when/if this
> hits production code.
>
> Anyhow, the timings are:
>
> Atomic: 258 milliseconds total (258 nanoseconds per iteration)
> Non-atomic: 88 milliseconds (88 nanoseconds per iteration)
>
> I think this is a small enough overhead that we should just make all
> reference counting atomic. This would make the race condition pointed out
> by Michael Fortin completely moot IIUC. It would also make it possible to
> safely share reference counted containers across threads in
> shared/synchronized wrappers, etc.
>
> Furthermore, the cost of atomic reference counting can be mitigated by the
> standard tricks for dealing with expensive to copy objects, though it
> shouldn't be necessary except in the most extreme conditions.
I'm all for atomic ref counting, personally. It simplifies all kinds of stuff. And
as long as the overhead is small enough, I don't really see a downside.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list