[dmd-concurrency] D's Memory Model

Wed Feb 10 08:41:58 PST 2010

On Feb 10, 2010, at 3:59 AM, Fawzi Mohamed wrote:

> 
> On 10-feb-10, at 03:14, Robert Jacques wrote:
> 
>> On Tue, 09 Feb 2010 05:12:29 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
>>> On 8-feb-10, at 19:15, Robert Jacques wrote:
>>>> On Mon, 08 Feb 2010 06:07:36 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
>> 
>> [snip]
>> 
>>>> Concurrent collectors trade throughput (GC time / program time) for program responsiveness. They also often trade program correctness for latency, as they are notoriously hard to get correct for languages with mutation. While, there are some very elegance concurrent GCs for functional languages, the high level descriptions of the Boehm concurrent GC seems to be inherently racy, while Java concurrent GCs tend to be highly instrumented and lock-tastic (mark-bit locking on every ref assignment).
>>> that can be done with atomic ops...
>> 
>> It can only be done correctly using a DCAS, (not dwCAS), so isn't implementable on any mainstream chip. So you have to lock the mark bits on an object. This can be done as a spin-lock using atomic ops, but that doesn't change the number or frequency they have to be taken.
> 
> I might miss something (and I don't know what DCAS is) but as you always go in one direction (unmarked -> marked) I don't see why something like (for 32 bit)
> 
> mark(uint[]bitfield,uint bit){
> 	auto idx=(bit+31)/32;
> 	auto mask=1>>(bit&0x1F);
> 	volatile auto oldV=bitfield[idx];
> 	while((oldV & mask)==0){
> 		auto oldV=atomicCAS(bitfield[idx],oldV | mask,oldV);
> 	}
> }
> 
> this can be extended if you have 3 states (white,gray,black) for concurrent GC or to refCounting.
> You have collisions only when two threads try to modify the same memory location, and even then given that the function is so fast to compute it should not be a big problem.
> False sharing makes things worse, but also that should not be something that cannot be made acceptable distributing a bit the mark bits in the memory.
> 
> At the end of the mark phase you need a memory barrier in each working thread, and a semaphore (or some flag) to ensure that all changes have global visibility, but that is it.

This works in a world where all reference updates are done via in-language operations.  But D supports both extern (C) functions and inline assembly, both of which defeat incremental collection.