Idiomatic D using GC as a library writer

Sun Dec 4 23:25:34 UTC 2022

On Sunday, 4 December 2022 at 22:46:52 UTC, Ali Çehreli wrote:
> That's way beyond my pay grade. Explain please. :)

The reason that the GC stops threads right now is to ensure that 
something doesn't change in the middle of its analysis.

Consider for example, the GC scans address 0 - 1000 and finds 
nothing. Then a running thread moves a reference from memory 
address 2200 down to address 800 while the GC is scanning 
1000-2000.

Then the GC scans 2000-3000, where the object used to be, but it 
isn't there anymore... and the GC has no clue it needs to scan 
address 800 again. It, never having seen the object, thinks the 
object is just dead and frees it.

Then the thread tries to use the object, leading to a crash.

The current implementation prevents this by stopping all threads. 
If nothing is running, nothing can move objects around while the 
GC is trying to find them.

But, actually stopping everything requires 1) the GC knows which 
threads are there and has a way to stop them and 2) is overkill! 
All it really needs to do is prevent certain operations that 
might change the GC's analysis while it is running, like what 
happened in the example. It isn't important to stop numeric work, 
that won't change the GC. It isn't important to stop pointer 
reads (well not in D's gc anyway, there's some that do need to 
stop this) so it doesn't need to stop them either.

Since what the GC cares about are pointer locations, it is 
possible to hook that specifically, which we call write barriers; 
they either block pointer writes or at least notify the GC about 
them. (And btw not all pointer writes need to be blocked either, 
just ones that would point to a different memory block. So things 
like slice iterations can also be allowed to continue. More on my 
blog 
http://dpldocs.info/this-week-in-d/Blog.Posted_2022_10_31.html#thoughts-on-pointer-barriers )

So what happens then:

GC scans address 0 - 1000 and finds nothing.

Then a running thread moves a reference from memory address 2200 
down to address 800... which would trigger the write barrier. The 
thread isn't allowed to complete this operation until the GC is 
done. Notice that the GC didn't have to know about this thread 
ahead of time, since the running thread is responsible for 
communicating its intentions to the GC as it happens. 
(Essentially, the GC holds a mutex and all pointer writes in 
generated D code are synchronized on it, but there's various 
implementations.)

Then the GC scans 2000-3000, and the object is still there since 
the write is paused! It doesn't free it.

The GC finishes its work and releases the barriers. The thread 
now resumes and finishes the move, with the object still alive 
and well. No crash.

This would be a concurrent GC, not stopping threads that are 
doing self-contained work, but it would also be more compatible 
with external threads, since no matter what the thread, it'd use 
that gc mutex barrier.