[dmd-concurrency] D's Memory Model

Wed Feb 10 01:59:13 PST 2010

On Wed, 10 Feb 2010 02:25:12 -0500, Walter Bright <walter at digitalmars.com>  
wrote:
> Robert Jacques wrote:
>>
>> Yes, there was one important language change which my first post  
>> advocated: the separation of local and shared (+immutable & unique)  
>> data into logically separate memory pools. I'm sorry that I didn't make  
>> it clearer. From a TDPL point of view, this allows implementations to  
>> eliminate false sharing (an important fact to highlight) and to use  
>> thread-local allocation and/or thread-local collectors. The downside is  
>> that it adds language complexity and makes casting from local to  
>> shared/immutable/unique implementation defined.
>>
>
> There's another way to do this that requires no language changes. Simply  
> have each thread have its own thread local pool to allocate from, but  
> memory once allocated is treated as global. This allows individual  
> allocations to be cast to shared, but the vast bulk will wind up being  
> thread local without sharing cache lines, and it will not require locks  
> to be taken for most allocations.
>
> A collection cycle, however, will still need to pause all threads and do  
> the whole shebang.

Yes. The reason I said that casting from local to shared was  
implementation defined was I knew the operation could be valid or invalid  
based on the type of GC used. However, your comment sounds like the  
standard C/C++ approach to thread-local allocation, which I discussed in  
the original post. In short, this model has been shown to be insufficient  
in C/C++ with regard to false sharing and has created several hacks to  
remove false sharing for identifiable hot-spots. Your comment also doesn't  
address the key challenge to D's GC, which is: where does the garbage  
returned to? (which I also discussed) Anyways, in order to prevent false  
sharing, small immutable objects need their own memory pool and shared  
objects need to be created in a different manner than local objects. To me  
this creates a logical separation of their memory pools and at a minimum  
requires the GC implementation to perform allocation using specific flags  
for each type qualifier.