Message passing between threads: Java 4 times faster than D

Fri Feb 10 06:55:36 PST 2012

On 02/10/12 14:54, Oliver Plow wrote:
>>> I wonder how much it helps to just optimize the GC a little.  How much
>>> does the performance gap close when you use DMD 2.058 beta instead of
>>> 2.057?  This upcoming release has several new garbage collector
>>> optimizations.  If the GC is the bottleneck, then it's not surprising
> 
> Is there a way to "turn off" the GC, e.g. a compiler switch to set the heap size to a large number so that the GC is likely not to set in? I searched through this page: http://www.d-programming-language.org/dmd-windows.html#switches But couldn't find anything helpful. Then you could measure the thing with GC "turned off" to see whether the GC is the problem or not.


Calling GC.disable() at runtime will delay the GC until it's actually
needed, but won't disable it completely.
Having a std noop GC stub selected by a switch would be nice, but you
can get the same effect by giving the linker an object that has the
necessary stubs.

For this test case something like the patch below improves things
significantly, for more gains, std.concurrency needs more invasive
changes.

Note it's just POC, to measure the current std.concurrency efficiency
vs other approaches. The "freelist" arrays are not freed and leak, a
complete implementation would free them when the link is shut down.
Ignore the synchronize() calls - "synchronized" isn't properly lowered
by the compiler, so i had to resort to this after switching the locking
primitives. It should work with std "synchronized" equally well.

The original testcase from this thread achieves ~4M msg/sec with this
change (the numbers aren't stable, but mostly in the 3.5..4.0M range,
4.5M+ happens sometimes). The memory usage also decreases noticeably.

artur

--- std/concurrency.d
+++ std/concurrency.d
@@ -1387,7 +1396,7 @@ private
                 m_last = n;
             Node* todelete = n.next;
             n.next = n.next.next;
-            //delete todelete;
+            delete todelete;
             m_count--;
         }
 
@@ -1430,6 +1439,56 @@ private
             {
                 val = v;
             }
+            import core.memory;
+            import core.exception;
+            new(size_t size) {
+               void* p;
+               if (afreelist.length)
+                  p = afreelist[--afreelist.length];
+               else if (gfreelist.length) {
+                  {
+                     scope lock = synchronize(fl);
+                     if (gfreelist.length) {
+                        afreelist = cast(Node*[])gfreelist;
+                        gfreelist.length=0;
+                     }
+                  }
+                  if (afreelist.length)
+                     p = afreelist[--afreelist.length];
+               }
+               
+               if (p)
+                  return p;
+
+               p = std.c.stdlib.malloc(size);
+               if (!p)
+                   throw new OutOfMemoryError();
+               GC.addRange(p, size);
+               return p;
+            }
+            delete(void* p) {
+               if (!p)
+                  return;
+               pfreelist ~= cast(Node*)p;
+               if (pfreelist.length>=8)
+               {
+                  {
+                     scope lock = synchronize(fl);
+                     gfreelist ~= cast(shared Node*[])pfreelist;
+                  }
+                  pfreelist.length=0;
+                  pfreelist.assumeSafeAppend();
+               }
+               // At some point all free nodes need to be freed, using:
+               //GC.removeRange(p);
+               //std.c.stdlib.free(p);
+            }
+            static Node*[] afreelist;
+            static ubyte[56] d1;
+            static Node*[] pfreelist;
+            static ubyte[56] d2;
+            shared static Node*[] gfreelist;
+            shared static Mutex fl;
         }