[phobos] Gcx: Would we ever want more than one?

Mon May 16 12:37:12 PDT 2011

On Mon, 16 May 2011 14:17:00 -0400, Sean Kelly <sean at invisibleduck.org>  
wrote:

> On May 14, 2011, at 7:09 PM, Brad Roberts wrote:
>
>> On 5/14/2011 7:02 PM, David Simcha wrote:
>>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>>>> Technically, you want a free list per core. I don't know how  
>>>> practical it is to figure that out though.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
>>>
>>> The idea being that, if you have a free list per core, there will  
>>> almost never be any contention in practice, even if
>>> you have way more threads than cores?
>>
>> Ideally neither contention nor cache swapping.  It'd stay in the l1 or 
>> l2 of the core directly involved with the
>> allocations.  By being thread centric even if not contended it could  
>> still wander between cores and thus the caches
>> associated with them.
>>
>> A serious micro-optimization, but..
>
> I mentioned it mostly because it seemed an option worth exploring if a  
> free list per thread turned out to be very difficult for some reason.  A  
> fixed array of free lists, one per core, would be easy if there were a  
> way to determine which core the caller was being executed by.  We may  
> have to figure out the per-thread stuff anyway though, since non-shared  
> data needs to be finalized by its owner thread.  Again, this could be  
> done by the owner core instead, but only if we could ensure that threads  
> don't move between cores.

Regarding thread-specific finalization, this does seem to gum things up a  
bit. The issue I see is that all objects to be finalized need to be placed  
onto some kind of free-list (which each thread would then processes later)  
while preserving the object's layout. Objects currently consist of  
{vtable,monitor,data...}. That doesn't really leave any room for a) a next  
object pointer or b) a block-info pointer (which might be used for  
fine-grain-lock/lock-free solutions).

One option is to re-use the monitor for a next pointer. Objects with a  
valid monitor would be finalized globally and zeroed before being placed  
on the local free list and 'local' objects would have the next  
point/monitor re-nulled prior to finalization. I see one potential corner  
case with this. If an object synchronizes on/calls a synchronized method  
on another object during its finalizer, then (possibly silent) corruption  
could occur. Now doing this is a) accessing "references [that] may no  
longer be valid" according to the spec and b) extremely rare  
(shared/syncronized objects generally would have a valid monitor prior to  
sweeping and would be finalized 'globally' not 'locally'). Yes, this is a  
bug in the users' code, but it's a bug that today will segfault or run  
correctly, not corrupt things.

Storing a block-info pointer as part of the free-list node provides a nice  
performance gain and allows for finer-gain locking. However, direct  
substitution won't work as there is no room inside  
{vtable,monitor/next*,data...} for a block-info*. One option would be to  
place the block-info* at the end of the object's allocation chunk. This  
would effectively mean adding an extra word to finalized objects for the  
purpose of allocation size.