Forked GC explained

Sat Sep 3 16:37:48 UTC 2022

On Saturday, 3 September 2022 at 13:35:39 UTC, frame wrote:
> I'm not sure I fully understand how it works. I know that the 
> OS creates read only memory pages for both and if a memory 
> section is about to be written, the OS will issue a copy of the 
> pages so any write operation will be done in it's own copy and 
> cannot mess up things.
>
> But then is the question, how can memory be marked as free? The 
> forked process cannot since it writes into a copy - how it is 
> synchronized then?
>
> Is the GC address root somehow shared between the processes? Or 
> does the forked process communicate the memory addresses back 
> to the parent?
>
> If so, does the GC just rely on this?
>
> Are freeing GC operations just locked while the forked process 
> is running?
>
> What happens if a manually `GC.free()` is called while the 
> forked process marks the memory as free too but the GC 
> immediately uses the memory again and then gets the 
> notification to free it from the forked child? Can this happen?

The OS creates a clone of the process. The original process which 
called fork() is called parent and the clone is called child.
The parent resumes normally after the call to fork returns and 
the child starts the mark phase.
The virtual memory map for both processes are identical at this 
point.
If either process writes to a page, the OS copies the page and 
writes the changes to the copy (Copy On Write).
Hence, modifed pages in the parent process can't be considered 
during the current collection cycle in the child.
At the end of the mark phase the child communicates the result to 
the parent, then exits.
The remaining work can then be completed by the parent in 
parallel as the pause is only required for the mark phase.

This works because every chunk of memory which is unreferenced in 
the parent is in the child, too, because it's a clone which 
doesn't mutate state except for the allocation required to hold 
the marked memory.
There is no need to do anything about the GC in the parent, it 
can allocate/free memory at will.
This doesn't interfere because the chunks that have been marked 
by the child are still considered in use by the parent, but 
unreferenced and ready to be collected.
After the child communicated its result to the parent, the GC 
thread in the parent can complete the collection cycle as if it 
had done the mark phase itself.
Anything that happened in the parent after the call to fork() 
will be considered in the next collection cycle.