More embarrassing microbenchmars for D's GC.
Sean Kelly
sean at invisibleduck.org
Mon Jun 9 15:53:10 PDT 2008
== Quote from Leandro Lucarella (llucax at gmail.com)'s article
> But there are a few other results I can't explain:
> 1) Why is D gc (disabled or not) version ~25% slower than the D version
> that uses malloc when iterating the list? It shouldn't be any GC
> activity in that part. Could be some GC locallity issue that yields
> more cache misses?
I think it may have more to do with the allocation strategy in the GC. It obtains
memory in chunks from the OS, and each chunk is typically a max of 8MB. So
for a test like this the D GC will end up hitting the OS quite a few times asking
for more memory. If I had to guess I'd say that malloc has a more efficient
strategy here. If you're interested, try running the same test using Tango
with and without a call tol tango.core.Memory.GC.reserve() for the amount of
memory you expect the app to use before the loop.
> 2) Why is D malloc version ~33% slower than the C version? I took a look
> at the generated assembly and it's almost identical:
> <_Dmain+198>: lea -0x20(%ebp),%eax
> <_Dmain+201>: lea 0x0(%esi,%eiz,1),%esi
> <_Dmain+208>: addl $0x1,0x8(%eax)
> <_Dmain+212>: adcl $0x0,0xc(%eax)
> <_Dmain+216>: mov (%eax),%eax
> <_Dmain+218>: test %eax,%eax
> <_Dmain+220>: jne 0x804a240 <_Dmain+208>
> <main+248>: lea -0x1c(%ebp),%eax
> <main+251>: nop
> <main+252>: lea 0x0(%esi,%eiz,1),%esi
> <main+256>: addl $0x1,0x4(%eax)
> <main+260>: adcl $0x0,0x8(%eax)
> <main+264>: mov (%eax),%eax
> <main+266>: test %eax,%eax
> <main+268>: jne 0x8048550 <main+256>
> <main+270>: movl $0x0,0x4(%esp)
> <main+278>: movl $0x8049800,(%esp)
No idea. I'd expect them to be roughly equivalent.
Sean
More information about the Digitalmars-d
mailing list