[dmd-concurrency] D's Memory Model

Wed Feb 10 12:21:14 PST 2010

On Wed, 10 Feb 2010 06:16:08 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
[snip]
>> A collection cycle, however, will still need to pause all threads and  
>> do the whole shebang.
>
> yes if that is parallelized then I don't think that the GC time overhead  
> is much larger than having separate local GC (just trigger the global GC  
> collection when the global allocation exceed X).

Parallel GCs do have a fair amount of overhead. They have to pause and  
start every running thread plus every GC thread. Then the marking  
algorithm itself is usually lock-free, so every mark-bit read or write is  
atomic at a minimum. Some implementations also maintain a to mark list,  
which needs to be synchronized for loading balancing reasons. And parallel  
GCs don't really address the embarrassing pause problem.

[snip]
> Both these GC approaches work without shared, the second one could have  
> a reduced overhead due to shared.
> The gain due to shared is not so clear to me because the main bottleneck  
> (allocation global lock) is removed in both cases.

The main bottleneck in GCs is collection, not allocation, because solving  
the allocation problem is well known and simple. We just haven't done it  
yet in D. What shared brings to the table is the ability to eliminate  
false sharing, which effects C/C++/C#/Java/etc, and to give first class  
support to thread-local garbage collection, which besides its great  
theoretical properties, is being used practically to great effect by  
Apple. And while granted we don't have to impose actually separation  
between memory pools to achieve the false sharing guarantee, if we do  
there's an opportunity to choose the best GC for each memory type, instead  
of one that can handle everything, but does so poorly.