Thread GC non "stop-the-world"

Mon Sep 22 17:15:51 PDT 2014

The cost of using the current GC in D, although beneficial for 
many types of programs, is unaffordable for programs such as 
games, etc... that need to perform repetitive tasks every short 
periods of time. The fact that a GC.malloc/realloc on any thread 
can trigger a memory collection that stop ALL threads of the 
program for a variable time prevents it. Conversations in the 
forum as "RFC: reference Counted Throwable", "Escaping the 
Tyranny of the GC: std.rcstring, first blood" and the @nogc 
attribute show that this is increasingly perceived as a problem.
Besides the ever-recurring "reference counting", many people 
propose to improve the current implementation of GC. Rainer 
Schuetze developed a concurrent GC in Windows:

    http://rainers.github.io/visuald/druntime/concurrentgc.html

With some/a lot of work and a little help compiler (currently it 
indicates by a flag if a class/structure contains 
pointers/references to other classes/structures, it could 
increase this support to indicate which fields are 
pointers/references) we could implement a 
semi-incremental-generational-copying GC-conservative like:

    http://www.hboehm.info/gc/
or
    http://www.ravenbrook.com/project/mps/

Being incremental, they try to minimize the "stop-the-world" 
phase. But even with an advanced GC, as programs become more 
complex and use more memory, pause time also increases. See for 
example (I know it's not normal case, but in a few years ...)

http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

(*) What if:
- It is forbidden for "__gshared" have references/pointers to 
objects allocated by the GC (if the compiler can help with this 
prohibition, perfect, if not the developer have to know what he 
is doing)
- "shared" types are not allocated by the GC (they could be 
reference counted or manually released or ...)
- "immutable" types are no longer implicitly "shared"

In short, the memory accessible from multiple threads is not 
managed by the GC.

With these restrictions each thread would have its "I_Allocator", 
whose default implementation would be an 
incremental-generational-semi-conservative-copying GC, with no 
inteference with any of the other program threads (it should be 
responsible only for the memory reserved for that thread). Other 
implementations of "I_Allocator" could be based on Andrei's 
allocators. With "setThreadAllocator" (similar to current 
gc_setProxy) you could switch between the different 
implementations if you need. Threads with critical time 
requirements could work with an implementation of "I_Allocator" 
not based on the GC. It would be possible simulate scoped classes:

{
	setThreadAllocator(I_Allocator_pseudo_stack)
	scope(exit) {
		I_Allocator_pseudo_stack.deleteAll();
		setThreadAllocator(I_Allocator_gc);
	}
	auto obj = MyClass();
	...
	// Destructor are called and memory released
}

Obviously changes (*) break compatibility with existing code, and 
therefore maybe they are not appropriate for D2. Also these are 
general ideas, sure these changes lead to other problems. But the 
point I want to convey is that in my opinion, while these 
problems are solvable, a language for "system programming" is 
incompatible with shared data managed by a GC

Thoughts?