Performance of GC.collect() for single block of `byte`s

Fri Sep 28 09:14:18 UTC 2018

On Monday, 24 September 2018 at 14:31:45 UTC, Steven 
Schveighoffer wrote:
>> Why is the overhead so big for a single allocation of an array 
>> with elements containing no indirections (which the GC doesn't 
>> need to scan for pointers).
>
> It's not scanning the blocks. But it is scanning the stack.

Ok, I modified the code to be

import std.stdio;

void* mallocAndFreeBytes(size_t byteCount)()
{
     import core.memory : pureMalloc, pureFree;
     void* ptr = pureMalloc(byteCount);
     pureFree(ptr);
     return ptr;                 // for side-effects
}

void main(string[] args)
{
     import std.datetime.stopwatch : benchmark;
     import core.time : Duration;

     immutable benchmarkCount = 1;

     // GC
     static foreach (const i; 0 .. 31)
     {
         {
             enum byteCount = 2^^i;
             const Duration[1] resultsC = 
benchmark!(mallocAndFreeBytes!(i))(benchmarkCount);
             writef("%s bytes: mallocAndFreeBytes: %s nsecs",
                    byteCount, 
cast(double)resultsC[0].total!"nsecs"/benchmarkCount);

             import core.memory : GC;
             auto dArray = new byte[byteCount]; // one Gig
             const Duration[1] resultsD = 
benchmark!(GC.collect)(benchmarkCount);
             writefln(" GC.collect(): %s nsecs after %s",

cast(double)resultsD[0].total!"nsecs"/benchmarkCount, dArray.ptr);
             dArray = null;
         }
     }
}

I still be believe these numbers are absolutely horrible

1 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 21600 nsecs 
after 7F1ECC0B1000
2 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20800 nsecs 
after 7F1ECC0B1010
4 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20500 nsecs 
after 7F1ECC0B1000
8 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20300 nsecs 
after 7F1ECC0B1010
16 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 23200 nsecs 
after 7F1ECC0B2000
32 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 19600 nsecs 
after 7F1ECC0B1000
64 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 17800 nsecs 
after 7F1ECC0B2000
128 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 16600 
nsecs after 7F1ECC0B1000
256 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 16200 
nsecs after 7F1ECC0B2000
512 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 15900 
nsecs after 7F1ECC0B1000
1024 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 15700 
nsecs after 7F1ECC0B2000
2048 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14600 
nsecs after 7F1ECC0B1010
4096 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 14400 
nsecs after 7F1ECC0B2010
8192 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 
nsecs after 7F1ECC0B4010
16384 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14100 
nsecs after 7F1ECC0B7010
32768 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 
nsecs after 7F1ECC0BC010
65536 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 
nsecs after 7F1ECC0C5010
131072 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 
nsecs after 7F1ECC0D6010
262144 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 
nsecs after 7F1ECC0F7010
524288 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 17500 
nsecs after 7F1ECAC14010
1048576 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 18000 
nsecs after 7F1ECAC95010
2097152 bytes: mallocAndFreeBytes: 500 nsecs GC.collect(): 18700 
nsecs after 7F1ECAD96010
4194304 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20000 
nsecs after 7F1ECA514010
8388608 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 61000 
nsecs after 7F1EC9913010
16777216 bytes: mallocAndFreeBytes: 24900 nsecs GC.collect(): 
27100 nsecs after 7F1EC8112010
33554432 bytes: mallocAndFreeBytes: 800 nsecs GC.collect(): 36600 
nsecs after 7F1EC5111010
67108864 bytes: mallocAndFreeBytes: 600 nsecs GC.collect(): 57900 
nsecs after 7F1EBF110010
134217728 bytes: mallocAndFreeBytes: 500 nsecs GC.collect(): 
98300 nsecs after 7F1EB310F010
268435456 bytes: mallocAndFreeBytes: 700 nsecs GC.collect(): 
175700 nsecs after 7F1E9B10E010
536870912 bytes: mallocAndFreeBytes: 600 nsecs GC.collect(): 
326900 nsecs after 7F1E6B10D010
1073741824 bytes: mallocAndFreeBytes: 900 nsecs GC.collect(): 
641500 nsecs after 7F1E0B04B010

How is it possible for the GC to be 500-1000 times slower than a 
malloc-free call for a single array containing just bytes with no 
indirections for such a simple function!!!?

I really don't understand this...