[dmd-concurrency] D's Memory Model

Wed Feb 10 03:16:08 PST 2010

On 10-feb-10, at 08:25, Walter Bright wrote:

> Robert Jacques wrote:
>>
>> Yes, there was one important language change which my first post  
>> advocated: the separation of local and shared (+immutable & unique)  
>> data into logically separate memory pools. I'm sorry that I didn't  
>> make it clearer. From a TDPL point of view, this allows  
>> implementations to eliminate false sharing (an important fact to  
>> highlight) and to use thread-local allocation and/or thread-local  
>> collectors. The downside is that it adds language complexity and  
>> makes casting from local to shared/immutable/unique implementation  
>> defined.
>>
>
> There's another way to do this that requires no language changes.  
> Simply have each thread have its own thread local pool to allocate  
> from, but memory once allocated is treated as global. This allows  
> individual allocations to be cast to shared, but the vast bulk will  
> wind up being thread local without sharing cache lines, and it will  
> not require locks to be taken for most allocations.

Yes this was exactly what I was advocating, probably I was too unclear  
about it, doing this removes the worse contention (the global gc lock  
on allocation) for all the "smallish" allocations, large allocations  
will probably still have a global lock somwhere, but that is probably  
ok.

> A collection cycle, however, will still need to pause all threads  
> and do the whole shebang.

yes if that is parallelized then I don't think that the GC time  
overhead is much larger than having separate local GC (just trigger  
the global GC collection when the global allocation exceed X).

In this setting one can also try to have a concurrent GC, but that is  
more complex (that was my subsequent discussion), because then one has  
to ensure that modifications during the mark phase don't lead to  
"lost" objects:
think that you have a.ptr=b and you do { c.ptr=a.ptr; a.ptr=null; }
If you are unluky the mark phase will loose b because when you look at  
c.ptr it doesn't yet point to b, and you look at a.ptr it doesn't  
point at b anymore....
there are several methods to avoid this, but all introduce an overhead.

Both these GC approaches work without shared, the second one could  
have a reduced overhead due to shared.
The gain due to shared is not so clear to me because the main  
bottleneck (allocation global lock) is removed in both cases.

Fawzi