GC experiments. Writing my own GC.

Tue May 13 01:29:18 PDT 2014

Destructors will almost certainly be run on the background thread. 
Although those run at shutdown will likely be on the main thread.

I hope the overall performance results are good.  I really need to see 
the timing of the collections to know.  But in any case it does indicate 
that some work on removing lock contention is worthwhile.

On 13/05/2014 5:04 PM, Rainer Schuetze via Digitalmars-d wrote:
>
> This sounds pretty cool.
>
> On 13.05.2014 06:42, Adam Sakareassen via Digitalmars-d wrote:
>> Hi all,
>>
> [...]
>> All memory allocations are entirely lock free (using CAS instructions).
>>   So a pre-empted thread will never block another.   For allocations of
>> less than 128 bytes, each thread is allocated memory from it's own
>> memory pool to avoid false sharing on the CPU's cache.
>
> There was a GSoC project a few years ago which was planning to do
> something similar, but then switched to go for implementing precise
> scanning. I always wanted to add lock-free memory allocations, too.
>
> [...]
>>
>> The mark phase needs to stop the world.  The sweeping portion of the
>> collection will run in the background.   This is similar to the current
>> implementation as the world is restarted after the mark phase, however
>> the thread doing the collection will not allocate the requested memory
>> to the calling thread until after the sweep has completed.  This means
>> that single threaded applications always wait for the full garbage
>> collection cycle.
>
> Does this mean that destructors/finalizers are called from a different
> thread? It was never guaranteed to be run on the same thread as the
> allocation of the object so far, but effectively that happens for
> single-threaded applications. It could help to continue doing that, some
> resources have to be released from the same thread they have been
> acquired (e.g. UI handles). For multi-threaded applications, I'm fine
> with "guaranteeing" that the destructor is never run on the same thread,
> though some standard mechanism should exist to forward to another thread.
>
>>
>> So far allocation speed seems to have improved.  I can't test collection
>> speed as it's not complete.  As a test I wrote a simple function that
>> allocates a linked list of 2 million items.  This function is then
>> spawned by 20 threads.  This test script is shown below.  Timing for
>> allocation (with GC disabled) is as follows.  (Using DMD 2.065)
>>
>> Existing GC code:  15700ms (average)
>> My GC code:   500ms (Average)
>
> Very promising numbers :-)
>
>>
>> When performing the same amount of allocations on a single thread, the
>> new code is still slightly faster than the old.
>>
>> What this demonstrates is that the locking mechanisms in the current GC
>> code is a huge overhead for multi threaded applications that perform a
>> lot of memory allocations.  (ie.  Use the “new” operator or dynamic
>> arrays.)
>>
>> It would be nice to see the default GC and memory allocator improved.
>> There is certainly room for improvement on the allocator end which may
>> mask some of the performance issues associated with garbage collection.
>>
>> In the future I think D needs to look at making collection precise.  It
>> would not be too hard to adjust the mark and sweep GC to be nearly
>> precise.  The language needs to support precise GC before things like
>> moving garbage collection become feasible.
>
> You might check how compatible your implementation is with this:
> https://github.com/rainers/druntime/tree/gcx_precise2
>