More embarrassing microbenchmars for D's GC.

Sean Kelly sean at invisibleduck.org
Mon Jun 9 15:53:10 PDT 2008


== Quote from Leandro Lucarella (llucax at gmail.com)'s article
> But there are a few other results I can't explain:
> 1) Why is D gc (disabled or not) version ~25% slower than the D version
>    that uses malloc when iterating the list? It shouldn't be any GC
>    activity in that part. Could be some GC locallity issue that yields
>    more cache misses?

I think it may have more to do with the allocation strategy in the GC.  It obtains
memory in chunks from the OS, and each chunk is typically a max of 8MB.  So
for a test like this the D GC will end up hitting the OS quite a few times asking
for more memory.  If I had to guess I'd say that malloc has a more efficient
strategy here.  If you're interested, try running the same test using Tango
with and without a call tol tango.core.Memory.GC.reserve() for the amount of
memory you expect the app to use before the loop.

> 2) Why is D malloc version ~33% slower than the C version? I took a look
>    at the generated assembly and it's almost identical:
> 	<_Dmain+198>:   lea    -0x20(%ebp),%eax
> 	<_Dmain+201>:   lea    0x0(%esi,%eiz,1),%esi
> 	<_Dmain+208>:   addl   $0x1,0x8(%eax)
> 	<_Dmain+212>:   adcl   $0x0,0xc(%eax)
> 	<_Dmain+216>:   mov    (%eax),%eax
> 	<_Dmain+218>:   test   %eax,%eax
> 	<_Dmain+220>:   jne    0x804a240 <_Dmain+208>
> 	<main+248>:     lea    -0x1c(%ebp),%eax
> 	<main+251>:     nop
> 	<main+252>:     lea    0x0(%esi,%eiz,1),%esi
> 	<main+256>:     addl   $0x1,0x4(%eax)
> 	<main+260>:     adcl   $0x0,0x8(%eax)
> 	<main+264>:     mov    (%eax),%eax
> 	<main+266>:     test   %eax,%eax
> 	<main+268>:     jne    0x8048550 <main+256>
> 	<main+270>:     movl   $0x0,0x4(%esp)
> 	<main+278>:     movl   $0x8049800,(%esp)

No idea.  I'd expect them to be roughly equivalent.


Sean



More information about the Digitalmars-d mailing list