[dmd-concurrency] D's Memory Model
Robert Jacques
sandford at jhu.edu
Wed Feb 10 12:59:36 PST 2010
On Wed, 10 Feb 2010 12:28:39 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
> On 10-feb-10, at 17:41, Sean Kelly wrote:
>> On Feb 10, 2010, at 3:59 AM, Fawzi Mohamed wrote:
>>> On 10-feb-10, at 03:14, Robert Jacques wrote:
>>>> On Tue, 09 Feb 2010 05:12:29 -0500, Fawzi Mohamed <fawzi at gmx.ch>
>>>> wrote:
>>>>> On 8-feb-10, at 19:15, Robert Jacques wrote:
>>>>>> On Mon, 08 Feb 2010 06:07:36 -0500, Fawzi Mohamed <fawzi at gmx.ch>
>>>>>> wrote:
>>>>
>>>> [snip]
>>>>
>>>>>> Concurrent collectors trade throughput (GC time / program time) for
>>>>>> program responsiveness. They also often trade program correctness
>>>>>> for latency, as they are notoriously hard to get correct for
>>>>>> languages with mutation. While, there are some very elegance
>>>>>> concurrent GCs for functional languages, the high level
>>>>>> descriptions of the Boehm concurrent GC seems to be inherently
>>>>>> racy, while Java concurrent GCs tend to be highly instrumented and
>>>>>> lock-tastic (mark-bit locking on every ref assignment).
>>>>> that can be done with atomic ops...
>>>>
>>>> It can only be done correctly using a DCAS, (not dwCAS), so isn't
>>>> implementable on any mainstream chip. So you have to lock the mark
>>>> bits on an object. This can be done as a spin-lock using atomic ops,
>>>> but that doesn't change the number or frequency they have to be taken.
>>>
>>> I might miss something (and I don't know what DCAS is) but as you
>>> always go in one direction (unmarked -> marked) I don't see why
>>> something like (for 32 bit)
>>>
>>> mark(uint[]bitfield,uint bit){
>>> auto idx=(bit+31)/32;
>>> auto mask=1>>(bit&0x1F);
>>> volatile auto oldV=bitfield[idx];
>>> while((oldV & mask)==0){
>>> auto oldV=atomicCAS(bitfield[idx],oldV | mask,oldV);
>>> }
>>> }
>>>
>>> this can be extended if you have 3 states (white,gray,black) for
>>> concurrent GC or to refCounting.
>>> You have collisions only when two threads try to modify the same
>>> memory location, and even then given that the function is so fast to
>>> compute it should not be a big problem.
>>> False sharing makes things worse, but also that should not be
>>> something that cannot be made acceptable distributing a bit the mark
>>> bits in the memory.
>>>
>>> At the end of the mark phase you need a memory barrier in each working
>>> thread, and a semaphore (or some flag) to ensure that all changes have
>>> global visibility, but that is it.
>>
>> This works in a world where all reference updates are done via in-
>> language operations. But D supports both extern (C) functions and
>> inline assembly, both of which defeat incremental collection.
>
> this works with a stop the world approach as well as now (and is
> parallel).
>
> I agree that to use that in a concurrent GC then you need some way to
> ensure that an update to a reference does not loose references.
> That is possible only if you have some control on the references
> updates, so either that has to be exposed to external programs, or you
> declare programs that modify references transferring the reference to an
> object from a memory location to another (simply loosing a reference or
> adding one is not a problem, only the combination is), as unsafe to be
> run during a GC mark phase if you use a concurrent GC and do not notify
> it of the change.
> Not the ideal solution, but that is all you can do: ether you stop the
> threads, or some control on the modifications has to be done...
The DCAS, or double compare and swap, does a compare and swap at two
different memory locations simultaneously. The problem is that in a
concurrent/incremental GC, you have to propagate the mark bits correctly.
And to do this you have to atomically set the mark-bits and the assigned
reference at the same time...although in writing this I think I figured
out at least two ways of doing this correctly without a DCAS. One involves
recursively propagating the mark-bits yourself, which could cause the
thread to pause for a long time and retain excess memory, or adding the
new node to a to-mark list, which would add a some serialization.
More information about the dmd-concurrency
mailing list