Performance of GC.collect() for single block of `byte`s

Mon Oct 1 13:51:15 UTC 2018

On 10/1/18 3:21 AM, Rainer Schuetze wrote:
> 
> 
> On 28/09/2018 14:21, Per Nordlöw wrote:
>> On Monday, 24 September 2018 at 14:31:45 UTC, Steven Schveighoffer wrote:
>>> It's not scanning the blocks. But it is scanning the stack.
>>>
>>> Each time you are increasing the space it must search for a given 
>>> *target*. It also must *collect* any previous items at the end of the 
>>> scan. Note that a collection is going to mark every single page and 
>>> bitset that is contained in the item being collected (which gets 
>>> increasingly larger).
>>
>> Is this because of the potentially (many) slices referencing this 
>> large block?
>>
>> I assume the GC doesn't scan the `byte`-array for pointer-values in 
>> this case, but that happens for `void`-arrays and class/pointer-arrays 
>> right?
>>
>> Couldn't that scan be optimized by adding a bitset that indicates 
>> which pages need to be scanned?
>>
>> Is it common for GC's to treat large objects in this way?
> 
> A profiler reveals that most of the time is spent in "sweeping" the 
> memory, i.e. looking for allocations no longer referenced. The existing 
> implementation checks every page which causes a linear growth of 
> required CPU resources with used memory.

Ouch! I hadn't thought of that.  But we aren't checking actual *pages*, 
right, we are checking bits to see where allocations are present?

I also remember that the way the bitsets work, they are always allocated 
for every 16 bytes, regardless of the block size for that page/pool. I 
didn't love that feature but maybe it is fixed by now.

I would imagine that checking 64 pages at once should be possible by 
logic-anding the allocated and unmarked bits to check things as quickly 
as possible.

> 
> This version https://github.com/rainers/druntime/tree/gc_opt_sweep takes 
> advantage of the known size of allocations to skip unnecessary checks. 
> The last commit also adds support for keeping track of the size of 
> blocks of consecutive free pages. With this your example has more or 
> less constant collection time (note that most of the program time is 
> spent setting the array to zero, though not measured, and that the 
> allocation often triggers a collection, too).

Yes, especially when you are increasing the allocation size each time.

> I also noticed a rather serious bug for huge allocations: 
> https://issues.dlang.org/show_bug.cgi?id=19281

-Steve