[phobos] Gcx: Would we ever want more than one?

Robert Jacques sandford at jhu.edu
Mon May 16 12:37:12 PDT 2011


On Mon, 16 May 2011 14:17:00 -0400, Sean Kelly <sean at invisibleduck.org>  
wrote:

> On May 14, 2011, at 7:09 PM, Brad Roberts wrote:
>
>> On 5/14/2011 7:02 PM, David Simcha wrote:
>>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>>>> Technically, you want a free list per core. I don't know how  
>>>> practical it is to figure that out though.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
>>>
>>> The idea being that, if you have a free list per core, there will  
>>> almost never be any contention in practice, even if
>>> you have way more threads than cores?
>>
>> Ideally neither contention nor cache swapping.  It'd stay in the l1 or 
>> l2 of the core directly involved with the
>> allocations.  By being thread centric even if not contended it could  
>> still wander between cores and thus the caches
>> associated with them.
>>
>> A serious micro-optimization, but..
>
> I mentioned it mostly because it seemed an option worth exploring if a  
> free list per thread turned out to be very difficult for some reason.  A  
> fixed array of free lists, one per core, would be easy if there were a  
> way to determine which core the caller was being executed by.  We may  
> have to figure out the per-thread stuff anyway though, since non-shared  
> data needs to be finalized by its owner thread.  Again, this could be  
> done by the owner core instead, but only if we could ensure that threads  
> don't move between cores.

Regarding thread-specific finalization, this does seem to gum things up a  
bit. The issue I see is that all objects to be finalized need to be placed  
onto some kind of free-list (which each thread would then processes later)  
while preserving the object's layout. Objects currently consist of  
{vtable,monitor,data...}. That doesn't really leave any room for a) a next  
object pointer or b) a block-info pointer (which might be used for  
fine-grain-lock/lock-free solutions).

One option is to re-use the monitor for a next pointer. Objects with a  
valid monitor would be finalized globally and zeroed before being placed  
on the local free list and 'local' objects would have the next  
point/monitor re-nulled prior to finalization. I see one potential corner  
case with this. If an object synchronizes on/calls a synchronized method  
on another object during its finalizer, then (possibly silent) corruption  
could occur. Now doing this is a) accessing "references [that] may no  
longer be valid" according to the spec and b) extremely rare  
(shared/syncronized objects generally would have a valid monitor prior to  
sweeping and would be finalized 'globally' not 'locally'). Yes, this is a  
bug in the users' code, but it's a bug that today will segfault or run  
correctly, not corrupt things.

Storing a block-info pointer as part of the free-list node provides a nice  
performance gain and allows for finer-gain locking. However, direct  
substitution won't work as there is no room inside  
{vtable,monitor/next*,data...} for a block-info*. One option would be to  
place the block-info* at the end of the object's allocation chunk. This  
would effectively mean adding an extra word to finalized objects for the  
purpose of allocation size.


More information about the phobos mailing list