Performance of GC.collect() for single block of `byte`s

Mon Sep 24 14:31:45 UTC 2018

On 9/24/18 5:49 AM, Per Nordlöw wrote:
> The code
> 
> import std.stdio;
> 
> void main(string[] args)
> {
>      import std.datetime.stopwatch : benchmark;
>      import core.time : Duration;
>      import core.memory : GC;
> 
>      immutable benchmarkCount = 1;
> 
>      foreach (const i; 0 .. 10)
>      {
>          const byteCount = i*100_000_000;
>          const array = new byte[byteCount]; // one Gig
>          const Duration[1] results = 
> benchmark!(GC.collect)(benchmarkCount);
>          writefln("%s bytes: Calling GC.collect() took %s nsecs after %s",
>                   byteCount, 
> cast(double)results[0].total!"nsecs"/benchmarkCount, array.ptr);
>      }
> }
> 
> prints when release-compiled with ldc
> 
> 0 bytes: Calling GC.collect() took 600 nsecs after null
> 100000000 bytes: Calling GC.collect() took 83000 nsecs after 7F785ED44010
> 200000000 bytes: Calling GC.collect() took 114600 nsecs after 7F784CF29010
> 300000000 bytes: Calling GC.collect() took 277600 nsecs after 7F7832201010
> 400000000 bytes: Calling GC.collect() took 400400 nsecs after 7F780E5CC010
> 500000000 bytes: Calling GC.collect() took 449700 nsecs after 7F77E1A8A010
> 600000000 bytes: Calling GC.collect() took 481200 nsecs after 7F780E5CC010
> 700000000 bytes: Calling GC.collect() took 529800 nsecs after 7F77E1A8A010
> 800000000 bytes: Calling GC.collect() took 547600 nsecs after 7F779A16E010
> 900000000 bytes: Calling GC.collect() took 925500 nsecs after 7F7749891010
> 
> Why is the overhead so big for a single allocation of an array with 
> elements containing no indirections (which the GC doesn't need to scan 
> for pointers).

It's not scanning the blocks. But it is scanning the stack.

Each time you are increasing the space it must search for a given 
*target*. It also must *collect* any previous items at the end of the 
scan. Note that a collection is going to mark every single page and 
bitset that is contained in the item being collected (which gets 
increasingly larger).

What you are timing is the incremental work being done by an ever 
growing task. And really, this isn't that much (at the end, you are 
still less than 1ms).

One thing that may cause some headache here is that when you have a 
large block, it is part of a long chain of pages. I'm not sure I 
remember correctly, but it's a logarithmic search to see if a pointer 
points at a live GC block, but it's a linear search to find the front of 
it. It may also have changed since I looked at it last.

-Steve