Performance of GC.collect() for single block of `byte`s
Per Nordlöw
per.nordlow at gmail.com
Fri Sep 28 09:14:18 UTC 2018
On Monday, 24 September 2018 at 14:31:45 UTC, Steven
Schveighoffer wrote:
>> Why is the overhead so big for a single allocation of an array
>> with elements containing no indirections (which the GC doesn't
>> need to scan for pointers).
>
> It's not scanning the blocks. But it is scanning the stack.
Ok, I modified the code to be
import std.stdio;
void* mallocAndFreeBytes(size_t byteCount)()
{
import core.memory : pureMalloc, pureFree;
void* ptr = pureMalloc(byteCount);
pureFree(ptr);
return ptr; // for side-effects
}
void main(string[] args)
{
import std.datetime.stopwatch : benchmark;
import core.time : Duration;
immutable benchmarkCount = 1;
// GC
static foreach (const i; 0 .. 31)
{
{
enum byteCount = 2^^i;
const Duration[1] resultsC =
benchmark!(mallocAndFreeBytes!(i))(benchmarkCount);
writef("%s bytes: mallocAndFreeBytes: %s nsecs",
byteCount,
cast(double)resultsC[0].total!"nsecs"/benchmarkCount);
import core.memory : GC;
auto dArray = new byte[byteCount]; // one Gig
const Duration[1] resultsD =
benchmark!(GC.collect)(benchmarkCount);
writefln(" GC.collect(): %s nsecs after %s",
cast(double)resultsD[0].total!"nsecs"/benchmarkCount, dArray.ptr);
dArray = null;
}
}
}
I still be believe these numbers are absolutely horrible
1 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 21600 nsecs
after 7F1ECC0B1000
2 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20800 nsecs
after 7F1ECC0B1010
4 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20500 nsecs
after 7F1ECC0B1000
8 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20300 nsecs
after 7F1ECC0B1010
16 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 23200 nsecs
after 7F1ECC0B2000
32 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 19600 nsecs
after 7F1ECC0B1000
64 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 17800 nsecs
after 7F1ECC0B2000
128 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 16600
nsecs after 7F1ECC0B1000
256 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 16200
nsecs after 7F1ECC0B2000
512 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 15900
nsecs after 7F1ECC0B1000
1024 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 15700
nsecs after 7F1ECC0B2000
2048 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14600
nsecs after 7F1ECC0B1010
4096 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 14400
nsecs after 7F1ECC0B2010
8192 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200
nsecs after 7F1ECC0B4010
16384 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14100
nsecs after 7F1ECC0B7010
32768 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200
nsecs after 7F1ECC0BC010
65536 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200
nsecs after 7F1ECC0C5010
131072 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200
nsecs after 7F1ECC0D6010
262144 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200
nsecs after 7F1ECC0F7010
524288 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 17500
nsecs after 7F1ECAC14010
1048576 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 18000
nsecs after 7F1ECAC95010
2097152 bytes: mallocAndFreeBytes: 500 nsecs GC.collect(): 18700
nsecs after 7F1ECAD96010
4194304 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20000
nsecs after 7F1ECA514010
8388608 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 61000
nsecs after 7F1EC9913010
16777216 bytes: mallocAndFreeBytes: 24900 nsecs GC.collect():
27100 nsecs after 7F1EC8112010
33554432 bytes: mallocAndFreeBytes: 800 nsecs GC.collect(): 36600
nsecs after 7F1EC5111010
67108864 bytes: mallocAndFreeBytes: 600 nsecs GC.collect(): 57900
nsecs after 7F1EBF110010
134217728 bytes: mallocAndFreeBytes: 500 nsecs GC.collect():
98300 nsecs after 7F1EB310F010
268435456 bytes: mallocAndFreeBytes: 700 nsecs GC.collect():
175700 nsecs after 7F1E9B10E010
536870912 bytes: mallocAndFreeBytes: 600 nsecs GC.collect():
326900 nsecs after 7F1E6B10D010
1073741824 bytes: mallocAndFreeBytes: 900 nsecs GC.collect():
641500 nsecs after 7F1E0B04B010
How is it possible for the GC to be 500-1000 times slower than a
malloc-free call for a single array containing just bytes with no
indirections for such a simple function!!!?
I really don't understand this...
More information about the Digitalmars-d-learn
mailing list