[Issue 20434] New: Drastic GC slowdown with repeated large allocations

Fri Dec 6 00:10:19 UTC 2019

https://issues.dlang.org/show_bug.cgi?id=20434

          Issue ID: 20434
           Summary: Drastic GC slowdown with repeated large allocations
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P1
         Component: druntime
          Assignee: nobody at puremagic.com
          Reporter: gregormueckl at gmx.de

I accidentally found a huge slowdown in the GC when running this synthetic
benchmark. I ran this program on Windows and Linux as 32 and 64 bit program
with largely the same result:

----
module gcbench;

import core.memory;
import core.time;
import std.array;
import std.datetime;
import std.random;
import std.stdio;

int main()
{
        for(int i = 0; i < 160; i++) {
                SysTime startTime = Clock.currTime();
                for(int j = 0; j < 1000; j++) {
                        int size = uniform!"[]"(1, 20);
                        int[] trash = uninitializedArray!(int[])(i *
1024*1024);
                        if(j % 10 == 0) GC.collect();
                }
                SysTime endTime = Clock.currTime();
                Duration duration = endTime - startTime;
                double msecs = duration.total!"msecs"();
                writeln(i, " ", msecs);
        }
        return 0;
}
----

Runtime of the inner loop increases roughly linearly with each new iteration.
It should become constant after some time. Instead, runtime can increase by 100
times and more. perf seems to identify a plausible culprit:

    55.65%  gcbench  gcbench             [.]
_D2gc4impl12conservativeQw15LargeObjectPool10allocPagesMFNbmZm
    12.46%  gcbench  gcbench             [.]
_D2gc4impl12conservativeQw3Gcx12collectRootsMFNbNlPvQcZv
    11.33%  gcbench  libc-2.30.so        [.] __memset_avx2_erms
     4.52%  gcbench  gcbench             [.]
_D2gc4impl12conservativeQw3Gcx5sweepMFNbZm
     1.66%  gcbench  libpthread-2.30.so  [.]
pthread_cond_timedwait@@GLIBC_2.3.2
     1.43%  gcbench  libpthread-2.30.so  [.] __pthread_mutex_unlock_usercnt
     1.39%  gcbench  libpthread-2.30.so  [.] __lll_lock_wait

Looking at trace for LargeObjectPool.allocPages, it shows that a single small
loop is responsible. Unfortunately, perf doesn't show the corresponding source
lines.

I suspect that large arrays are kept alive across GC runs. Maybe Issue #13558
is related because I do not see any arrays getting freed at all, That issue is
still marked as NEW.

Zeroing the arrays doesn't change the result.

--