[dmd-concurrency] D's Memory Model

Fawzi Mohamed fawzi at gmx.ch
Wed Feb 10 09:28:39 PST 2010


On 10-feb-10, at 17:41, Sean Kelly wrote:

> On Feb 10, 2010, at 3:59 AM, Fawzi Mohamed wrote:
>
>>
>> On 10-feb-10, at 03:14, Robert Jacques wrote:
>>
>>> On Tue, 09 Feb 2010 05:12:29 -0500, Fawzi Mohamed <fawzi at gmx.ch>  
>>> wrote:
>>>> On 8-feb-10, at 19:15, Robert Jacques wrote:
>>>>> On Mon, 08 Feb 2010 06:07:36 -0500, Fawzi Mohamed <fawzi at gmx.ch>  
>>>>> wrote:
>>>
>>> [snip]
>>>
>>>>> Concurrent collectors trade throughput (GC time / program time)  
>>>>> for program responsiveness. They also often trade program  
>>>>> correctness for latency, as they are notoriously hard to get  
>>>>> correct for languages with mutation. While, there are some very  
>>>>> elegance concurrent GCs for functional languages, the high level  
>>>>> descriptions of the Boehm concurrent GC seems to be inherently  
>>>>> racy, while Java concurrent GCs tend to be highly instrumented  
>>>>> and lock-tastic (mark-bit locking on every ref assignment).
>>>> that can be done with atomic ops...
>>>
>>> It can only be done correctly using a DCAS, (not dwCAS), so isn't  
>>> implementable on any mainstream chip. So you have to lock the mark  
>>> bits on an object. This can be done as a spin-lock using atomic  
>>> ops, but that doesn't change the number or frequency they have to  
>>> be taken.
>>
>> I might miss something (and I don't know what DCAS is) but as you  
>> always go in one direction (unmarked -> marked) I don't see why  
>> something like (for 32 bit)
>>
>> mark(uint[]bitfield,uint bit){
>> 	auto idx=(bit+31)/32;
>> 	auto mask=1>>(bit&0x1F);
>> 	volatile auto oldV=bitfield[idx];
>> 	while((oldV & mask)==0){
>> 		auto oldV=atomicCAS(bitfield[idx],oldV | mask,oldV);
>> 	}
>> }
>>
>> this can be extended if you have 3 states (white,gray,black) for  
>> concurrent GC or to refCounting.
>> You have collisions only when two threads try to modify the same  
>> memory location, and even then given that the function is so fast  
>> to compute it should not be a big problem.
>> False sharing makes things worse, but also that should not be  
>> something that cannot be made acceptable distributing a bit the  
>> mark bits in the memory.
>>
>> At the end of the mark phase you need a memory barrier in each  
>> working thread, and a semaphore (or some flag) to ensure that all  
>> changes have global visibility, but that is it.
>
> This works in a world where all reference updates are done via in- 
> language operations.  But D supports both extern (C) functions and  
> inline assembly, both of which defeat incremental collection.

this works with a stop the world approach as well as now (and is  
parallel).

I agree that to use that in a concurrent GC then you need some way to  
ensure that an update to a reference does not loose references.
That is possible only if you have some control on the references  
updates, so either that has to be exposed to external programs, or you  
declare programs that modify references transferring the reference to  
an object from a memory location to another (simply loosing a  
reference or adding one is not a problem, only the combination is), as  
unsafe to be run during a GC mark phase if you use a concurrent GC and  
do not notify it of the change.
Not the ideal solution, but that is all you can do: ether you stop the  
threads, or some control on the modifications has to be done...



More information about the dmd-concurrency mailing list