Message passing between threads: Java 4 times faster than D

Sean Kelly sean at invisibleduck.org
Thu Feb 9 14:11:43 PST 2012


On Feb 9, 2012, at 10:31 AM, Marco Leise wrote:

> Am 09.02.2012, 18:35 Uhr, schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:
> 
>> On 2/9/12 6:10 AM, Gor Gyolchanyan wrote:
>>> Generally, D's message passing is implemented in quite easy-to-use
>>> way, but far from being fast.
>>> I dislike the Variant structure, because it adds a huge overhead. I'd
>>> rather have a templated message passing system with type-safe message
>>> queue, so no Variant is necessary.
>>> In specific cases Messages can be polymorphic objects. This will be
>>> way faster, then Variant.
>> 
>> cc Sean Kelly
>> 
>> I haven't looked at the implementation, but one possible liability is that large messages don't fit in a Variant and must use dynamic allocation under the wraps. There are a number of ways to avoid that, such as parallel arrays (one array per type for data and one for the additional tags).
>> 
>> We must make the message passing subsystem to not use any memory allocation in the quiescent state. If we're doing one allocation per message passed, that might explain the 4x performance difference (I have no trouble figuring Java's allocator is this much faster than D's).
> 
> Well, what does +1 Variant and +1 LinkedListNode sum up to?

FWIW, you can use DMD's built in profiler so long as the receiving thread is the same as the sending thread:

import std.concurrency;

void main() {
    for(int i = 0; i < 1_000_000; i++) {
        send(thisTid, 12345);
        auto x = receiveOnly!int();
    }
}


I generated timings for this both before and after adding "scope" to mbox.get():

$ dmd -release -inline -O abc
$ time abc

real	0m0.831s
user	0m0.829s
sys	0m0.002s

… add "scope" to mbox.get()

$ dmd -release -inline -O abc
$ time abc

real	0m0.653s
user	0m0.649s
sys	0m0.003s


And here's the trace log after "scope" was added.  Notice that there were 61 calls to GCX.fullcollect().  We can also see that there was 1 allocation per send/receive operation, so only an alloc for the message list node.

$ dmd -O -release -profile abc
gladsheim:misc sean$ time abc

real	0m11.348s
user	0m11.331s
sys	0m0.015s


======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

1000000   437709765   220179413         220     void std.concurrency._send!(int)._send(std.concurrency.MsgType, std.concurrency.Tid, int)
1000000   300987757   140736393         140     bool std.concurrency.MessageBox.get!(nothrow @safe void delegate(int), pure @safe void function(std.concurrency.LinkTerminated)*, pure @safe void function(std.concurrency.OwnerTerminated)*, pure @safe void function(std.variant.VariantN!(32u).VariantN)*).get(scope nothrow @safe void delegate(int), scope pure @safe void function(std.concurrency.LinkTerminated)*, scope pure @safe void function(std.concurrency.OwnerTerminated)*, scope pure @safe void function(std.variant.VariantN!(32u).VariantN)*)
1000000   202131609    89479808          89     void* gc.gcx.GC.malloc(uint, uint, uint*)
      1   825045422    57556501    57556501     _Dmain
1000033   112651800    52026745          52     void* gc.gcx.GC.mallocNoSync(uint, uint, uint*)
     61    53422342    49606106      813214     uint gc.gcx.Gcx.fullcollect(void*)
2000000   160103753    42531732          21     bool std.concurrency.MessageBox.get!(nothrow @safe void delegate(int), pure @safe void function(std.concurrency.LinkTerminated)*, pure @safe void function(std.concurrency.OwnerTerminated)*, pure @safe void function(std.variant.VariantN!(32u).VariantN)*).get(scope nothrow @safe void delegate(int), scope pure @safe void function(std.concurrency.LinkTerminated)*, scope pure @safe void function(std.concurrency.OwnerTerminated)*, scope pure @safe void function(std.variant.VariantN!(32u).VariantN)*).bool scan(ref std.concurrency.List!(std.concurrency.Message).List)
2000000    42018612    39837170          19     int std.variant.VariantN!(32u).VariantN.handler!(int).handler(std.variant.VariantN!(32u).VariantN.OpID, ubyte[32]*, void*)
1000000   117572021    24641771          24     bool std.concurrency.MessageBox.get!(nothrow @safe void delegate(int), pure @safe void function(std.concurrency.LinkTerminated)*, pure @safe void function(std.concurrency.OwnerTerminated)*, pure @safe void function(std.variant.VariantN!(32u).VariantN)*).get(scope nothrow @safe void delegate(int), scope pure @safe void function(std.concurrency.LinkTerminated)*, scope pure @safe void function(std.concurrency.OwnerTerminated)*, scope pure @safe void function(std.variant.VariantN!(32u).VariantN)*).bool onStandardMsg(ref std.concurrency.Message)
1000000    47280794    20418675          20     void std.concurrency.Message.map!(nothrow @safe void delegate(int)).map(nothrow @safe void delegate(int))
1000000   316556767    15569009          15     int std.concurrency.receiveOnly!(int).receiveOnly()
1000000    36317362    13212905          13     @property bool std.variant.VariantN!(32u).VariantN.convertsTo!(int).convertsTo()
1000000    15445906    10879089          10     std.concurrency.Message std.concurrency.Message.__ctor!(int).__ctor(std.concurrency.MsgType, int)
1000000    45649454     9332092           9     @property bool std.concurrency.Message.convertsTo!(int).convertsTo()
1000000    26790032     7875877           7     @property int std.variant.VariantN!(32u).VariantN.get!(int).get()
1000000   444757778     7048013           7     void std.concurrency._send!(int)._send(std.concurrency.Tid, int)
  15512     6681162     6657279         429     int gc.gcx.Gcx.allocPage(ubyte)
1000000   450932153     6174374           6     void std.concurrency.send!(int).send(std.concurrency.Tid, int)
1000000     4566817     4566817           4     std.variant.VariantN!(32u).VariantN std.variant.VariantN!(32u).VariantN.opAssign!(int).opAssign(int)
   4087     3635735     3518353         860     void gc.gcx.Gcx.mark(void*, void*, int)
2000000     2069965     2069965           1     int std.variant.VariantN!(32u).VariantN.handler!(int).handler(std.variant.VariantN!(32u).VariantN.OpID, ubyte[32]*, void*).bool tryPutting(int*, TypeInfo, void*)
2990271      195287      195287           0     void* gc.gcx.sentinel_add(void*)
1000000      147610      147610           0     bool std.concurrency.MessageBox.get!(nothrow @safe void delegate(int), pure @safe void function(std.concurrency.LinkTerminated)*, pure @safe void function(std.concurrency.OwnerTerminated)*, pure @safe void function(std.variant.VariantN!(32u).VariantN)*).get(scope nothrow @safe void delegate(int), scope pure @safe void function(std.concurrency.LinkTerminated)*, scope pure @safe void function(std.concurrency.OwnerTerminated)*, scope pure @safe void function(std.variant.VariantN!(32u).VariantN)*).bool pty(ref std.concurrency.List!(std.concurrency.Message).List)
1000032      121459      121459           0     void gc.gcx.Gcx.setBits(gc.gcx.Pool*, uint, uint)
2000000      111475      111475           0     int std.variant.VariantN!(32u).VariantN.handler!(int).handler(std.variant.VariantN!(32u).VariantN.OpID, ubyte[32]*, void*).int* getPtr(void*)
1000033      102413      102413           0     ubyte gc.gcx.Gcx.findBin(uint)
1002228       94742       94742           0     @property uint gc.gcx.Pool.shiftBy()
1000000       72086       72086           0     int std.concurrency.receiveOnly!(int).receiveOnly().nothrow @safe void __lambda17(int)
 995119       70854       70854           0     void gc.gcx.Gcx.log_free(void*)
1000033       61505       61505           0     void gc.gcx.sentinel_init(void*, uint)
1000033       55070       55070           0     void gc.gcx.Gcx.log_malloc(void*, uint)
 995119       51748       51748           0     void gc.gcx.sentinel_Invariant(const(void*))
      1       24692       24692       24692     void gc.gcx.Pool.initialize(uint, bool)
   3538     3544517       24525           6     void gc.gcx.Gcx.mark(void*, void*)
  15511       23883       23522           1     uint gc.gcx.Pool.allocPages(uint)
 124447       11615       11615           0     void gc.gcx.Gcx.clrBitsSmallSweep(gc.gcx.Pool*, uint, uint)
      1        7051        5921        5921     void gc.gcx.GC.initialize()
      1       27490        2797        2797     gc.gcx.Pool* gc.gcx.Gcx.newPool(uint, bool)
     61    53424783        2441          40     uint gc.gcx.Gcx.fullcollectshell()
     55        2227        2227          40     void gc.gcx.Gcx.addRange(void*, void*)
      1        1129        1119        1119     void gc.gcx.Gcx.initialize()
     16         360         360          22     uint gc.gcx.Pool.extendPages(uint)
      1        2547         319         319     void gc.gcx.GC.addRange(void*, uint)
   2196         278         278           0     gc.gcx.Pool* gc.gcx.Gcx.findPool(void*)
      1          65          65          65     void gc.gcx.GC.disable()
      1           9           9           9     void gc.gcx.Gcx.log_init()
      1           5           5           5     void gc.gcx.GC.enable()
      1           0           0           0     void gc.gcx.GC.setStackBottom(void*)
      1           0           0           0     @property uint gc.gcx.Pool.divisor()

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120209/3bee09f9/attachment-0001.html>


More information about the Digitalmars-d mailing list