Concurrent GC (for Windows)

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 12 17:38:31 PDT 2014


12-Jun-2014 10:34, Rainer Schuetze пишет:
>
>
> On 11.06.2014 18:59, Dmitry Olshansky wrote:
>> 03-Jun-2014 11:35, Rainer Schuetze пишет:
>>> Hi,
>>>
>>> more GC talk: the last couple of days, I've been experimenting with
>>> implementing a concurrent GC on Windows inspired by Leandros CDGC.
>>> Here's a report on my experiments:
>>>
>>> http://rainers.github.io/visuald/druntime/concurrentgc.html
>>>
[snip]
>>
>> See the sketch of the idea here :
>> https://gist.github.com/DmitryOlshansky/5e32057e047425480f0e
>>
>
> Cool stuff! I remember trying something similar, but IIRC forcing the
> same address with MapViewOfFile somehow failed (maybe this was across
> processes). I tried your version on both Win32 and Win64 successfully,
> though.



> I implemented the QueryWorkingSetEx version like this (you need a
> converted psapi.lib for Win32):

Yes, exactly, but I forgot the recipe to convert COFF/OMF import libraries.

>
> enum PAGES = 512; //SIZE / 4096;
> PSAPI_WORKING_SET_EX_INFORMATION[PAGES] info;
> foreach(i, ref inf; info)
>      inf.VirtualAddress = heap + i * 4096;
> if (!QueryWorkingSetEx(GetCurrentProcess(), info.ptr, info.sizeof))
>      throw new Exception(format("Could not query info (%d).\n",
> GetLastError()));
>
> foreach(i, ref inf; info)
>      writefln("flags page %d: %x", i, inf.VirtualAttributes);
>
>
> and you can check the "shared" field to get copied pages.


> This function
> is not supported on XP, though.

I wouldn't worry about it, it's not like XP users are growing in 
numbers. Also it looks like only 64bit version is good to go, as on 
32bit it would reduce usable memory in half.

> A short benchmark shows that VirtualQuery needs 55/42 ms for your test
> on Win32/Win64 on my mobile i7, while QueryWorkingSetEx takes about 17
> ms for both.

Seems in line with my measurements. Strictly speaking 1/2 of pages, 
interleaved should give the estimate of the worst case. Together with 
remapping (freeing duplicated pages) It doesn't go beyond 250ms on 640Mb 
of heap.

> If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
> copied), I get 80-90 ms more.

Aye... this is a lot. Also for me it turns out that unmapping CoW view 
at the last step takes the most of time. It might help to split the full 
heap into multiple views.

Also using VirtualProtect during the first step - turning a mapping into 
CoW one is faster then unmap/map (by factor of 2).

One thing that may help is saving a pointer to the end of used heap at 
the moment of scan, then remaping only this portion as COW.

Last issue I see is adjustment of pointers - in a GC, the mapped view is 
mapped at new address so it would need a fixup them during scanning.

>
> The numbers are not great, but I guess the usual memory usage and number
> of modified pages will be much lower. I'll see if I can integrate this
> into the concurrent implementation.

Wish you luck, I'm still not sure if it will help.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list