GC experiments. Writing my own GC.

Tue May 13 00:04:24 PDT 2014

This sounds pretty cool.

On 13.05.2014 06:42, Adam Sakareassen via Digitalmars-d wrote:
> Hi all,
>
[...]
> All memory allocations are entirely lock free (using CAS instructions).
>   So a pre-empted thread will never block another.   For allocations of
> less than 128 bytes, each thread is allocated memory from it's own
> memory pool to avoid false sharing on the CPU's cache.

There was a GSoC project a few years ago which was planning to do 
something similar, but then switched to go for implementing precise 
scanning. I always wanted to add lock-free memory allocations, too.

[...]
>
> The mark phase needs to stop the world.  The sweeping portion of the
> collection will run in the background.   This is similar to the current
> implementation as the world is restarted after the mark phase, however
> the thread doing the collection will not allocate the requested memory
> to the calling thread until after the sweep has completed.  This means
> that single threaded applications always wait for the full garbage
> collection cycle.

Does this mean that destructors/finalizers are called from a different 
thread? It was never guaranteed to be run on the same thread as the 
allocation of the object so far, but effectively that happens for 
single-threaded applications. It could help to continue doing that, some 
resources have to be released from the same thread they have been 
acquired (e.g. UI handles). For multi-threaded applications, I'm fine 
with "guaranteeing" that the destructor is never run on the same thread, 
though some standard mechanism should exist to forward to another thread.

>
> So far allocation speed seems to have improved.  I can't test collection
> speed as it's not complete.  As a test I wrote a simple function that
> allocates a linked list of 2 million items.  This function is then
> spawned by 20 threads.  This test script is shown below.  Timing for
> allocation (with GC disabled) is as follows.  (Using DMD 2.065)
>
> Existing GC code:  15700ms (average)
> My GC code:   500ms (Average)

Very promising numbers :-)

>
> When performing the same amount of allocations on a single thread, the
> new code is still slightly faster than the old.
>
> What this demonstrates is that the locking mechanisms in the current GC
> code is a huge overhead for multi threaded applications that perform a
> lot of memory allocations.  (ie.  Use the “new” operator or dynamic
> arrays.)
>
> It would be nice to see the default GC and memory allocator improved.
> There is certainly room for improvement on the allocator end which may
> mask some of the performance issues associated with garbage collection.
>
> In the future I think D needs to look at making collection precise.  It
> would not be too hard to adjust the mark and sweep GC to be nearly
> precise.  The language needs to support precise GC before things like
> moving garbage collection become feasible.

You might check how compatible your implementation is with this: 
https://github.com/rainers/druntime/tree/gcx_precise2