Performance of GC.collect() for single block of `byte`s

Wed Oct 3 08:46:24 UTC 2018

On 01/10/2018 15:51, Steven Schveighoffer wrote:
> On 10/1/18 3:21 AM, Rainer Schuetze wrote:
>> A profiler reveals that most of the time is spent in "sweeping" the 
>> memory, i.e. looking for allocations no longer referenced. The 
>> existing implementation checks every page which causes a linear growth 
>> of required CPU resources with used memory.
> 
> Ouch! I hadn't thought of that.  But we aren't checking actual *pages*, 
> right, we are checking bits to see where allocations are present?

It is not touching the pages, but a byte array classifying the page as 
free, used and first of a used block.

> I also remember that the way the bitsets work, they are always allocated 
> for every 16 bytes, regardless of the block size for that page/pool. I 
> didn't love that feature but maybe it is fixed by now.

Not for the pool for "large" objects, it uses bits per page there. For 
small object pools, it's still as you remember.

> I would imagine that checking 64 pages at once should be possible by 
> logic-anding the allocated and unmarked bits to check things as quickly 
> as possible.

With my patch, the number of pages of allocated blocks and regions of 
consecutive free pages are known (by consulting another array), so no 
bit-scanning is necessary.