Performance of GC.collect() for single block of `byte`s

Per Nordlöw per.nordlow at gmail.com
Fri Sep 28 09:32:13 UTC 2018


On Friday, 28 September 2018 at 09:14:18 UTC, Per Nordlöw wrote:
> How is it possible for the GC to be 500-1000 times slower than 
> a malloc-free call for a single array containing just bytes 
> with no indirections for such a simple function!!!?
>
> I really don't understand this...

I change the code to not make use of the GC when printing:

import core.stdc.stdio: printf;

void* mallocAndFreeBytes(size_t byteCount)()
{
     import core.memory : pureMalloc, pureFree;
     void* ptr = pureMalloc(byteCount);
     pureFree(ptr);
     return ptr;                 // for side-effects
}

void main(string[] args)
{
     import std.datetime.stopwatch : benchmark;
     import core.time : Duration;

     immutable benchmarkCount = 1;

     // GC
     static foreach (const size_t i; 0 .. 32)
     {
         {
             enum byteCount = 2UL^^i;
             const Duration[1] resultsC = 
benchmark!(mallocAndFreeBytes!(i))(benchmarkCount);
             printf("%ld bytes: mallocAndFreeBytes: %ld nsecs",
                    byteCount, 
cast(size_t)(cast(double)resultsC[0].total!"nsecs"/benchmarkCount));

             import core.memory : GC;
             auto dArray = new byte[byteCount]; // one Gig
             const Duration[1] resultsD = 
benchmark!(GC.collect)(benchmarkCount);
             printf("  GC.collect(): %ld nsecs after %p\n",
                    
cast(size_t)(cast(double)resultsD[0].total!"nsecs"/benchmarkCount), dArray.ptr);
             dArray = null;
         }
     }
}

I still get terrible numbers:

1 bytes: mallocAndFreeBytes: 600 nsecs  GC.collect(): 29600 nsecs 
after 0x7fbab535b000
2 bytes: mallocAndFreeBytes: 500 nsecs  GC.collect(): 28600 nsecs 
after 0x7fbab535b010
4 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 27700 nsecs 
after 0x7fbab535b000
8 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 27600 nsecs 
after 0x7fbab535b010
16 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 32100 
nsecs after 0x7fbab535c000
32 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 27100 
nsecs after 0x7fbab535b000
64 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 48500 
nsecs after 0x7fbab535c000
128 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 23300 
nsecs after 0x7fbab535b000
256 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 22300 
nsecs after 0x7fbab535c000
512 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 21800 
nsecs after 0x7fbab535b000
1024 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 21900 
nsecs after 0x7fbab535c000
2048 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 26300 
nsecs after 0x7fbab3ebe010
4096 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 25100 
nsecs after 0x7fbab3ebf010
8192 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 24500 
nsecs after 0x7fbab3ec1010
16384 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 24700 
nsecs after 0x7fbab3ec4010
32768 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 24600 
nsecs after 0x7fbab3ec9010
65536 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 24600 
nsecs after 0x7fbab3ed2010
131072 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 25000 
nsecs after 0x7fbab3ee3010
262144 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 25000 
nsecs after 0x7fbab3f04010
524288 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 25200 
nsecs after 0x7fbab3f45010
1048576 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 25800 
nsecs after 0x7fbab3fc6010
2097152 bytes: mallocAndFreeBytes: 300 nsecs  GC.collect(): 17200 
nsecs after 0x7fbab37be010
4194304 bytes: mallocAndFreeBytes: 500 nsecs  GC.collect(): 25700 
nsecs after 0x7fbab39bf010
8388608 bytes: mallocAndFreeBytes: 400 nsecs  GC.collect(): 65500 
nsecs after 0x7fbab2bbd010
16777216 bytes: mallocAndFreeBytes: 1100 nsecs  GC.collect(): 
47200 nsecs after 0x7fbab13bc010
33554432 bytes: mallocAndFreeBytes: 800 nsecs  GC.collect(): 
50300 nsecs after 0x7fbaae3bb010
67108864 bytes: mallocAndFreeBytes: 800 nsecs  GC.collect(): 
63800 nsecs after 0x7fbaa83ba010
134217728 bytes: mallocAndFreeBytes: 600 nsecs  GC.collect(): 
100000 nsecs after 0x7fba9c3b9010
268435456 bytes: mallocAndFreeBytes: 1000 nsecs  GC.collect(): 
176100 nsecs after 0x7fba843b8010
536870912 bytes: mallocAndFreeBytes: 1000 nsecs  GC.collect(): 
415500 nsecs after 0x7fba543b7010
1073741824 bytes: mallocAndFreeBytes: 800 nsecs  GC.collect(): 
649900 nsecs after 0x7fb9f42f5010
2147483648 bytes: mallocAndFreeBytes: 1200 nsecs  GC.collect(): 
973800 nsecs after 0x7fb934112010

It seems to scale kind linearly with byteCount above 16Mb... So 
it seems like its scanning the allocated block of bytes even 
though the element type of array is a value type. Why?

If I zero the pointer just after allocation I get a GC.collect() 
taking constantly 100ns so it can't be related to the stack.


More information about the Digitalmars-d-learn mailing list