Thread local and memory allocation

Tue Oct 4 23:19:44 PDT 2011

On Tue, 04 Oct 2011 23:56:52 -0400, Andrew Wiley <wiley.andrew.j at gmail.com> wrote:

> On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j at gmail.com> wrote:
>> On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford at jhu.edu> wrote:
>>> On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j at gmail.com>
>>> wrote:
>>>
>>>> On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
>>>> <newshound2 at digitalmars.com> wrote:
>>>>>
>>>>> On 10/4/2011 1:22 AM, deadalnix wrote:
>>>>>>
>>>>>> Do you mean manage the memory that way :
>>>>>> Shared heap -> TL pool within the shared heap -> allocation in thread
>>>>>> from
>>>>>> TL pool.
>>>>>>
>>>>>> And complete GC collect.
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>> This is a good solution do reduce contention on allocation. But a very
>>>>>> different
>>>>>> thing than I was initially talking about.
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>> Back to the point,
>>>>>>
>>>>>> Considering you have pointer to immutable from any dataset, but not the
>>>>>> other
>>>>>> way around, this is also valid to get a flag for it in the allocation
>>>>>> interface.
>>>>>>
>>>>>> What is the issue with the compiler here ?
>>>>>
>>>>> Allocate an object, then cast it to immutable, and pass it to another
>>>>> thread.
>>>>>
>>>>
>>>> Assuming we have to make a call to the GC when an object toggles its
>>>> immutable/shared state, it seems like this whole approach would
>>>> basically murder anyone doing message passing with ownership changes,
>>>> because the workflow tends to be create an object -> cast to shared ->
>>>> send to another thread -> cast away shared -> do work -> cast to
>>>> shared...
>>>> On the other hand, I guess the counterargument is that locking an
>>>> uncontended lock is on the order of two instructions (or so I'm told),
>>>> so casting away shared probably isn't ever necessary. It just seems
>>>> somewhat counterintuitive that casting to and from shared would be
>>>> slower than unnecessarily locking the object.
>>>>
>>>
>>> It's entirely possible to simply allocate the memory for the object from the
>>> shared heap to start with. Then no more calls to the GC are needed.
>>>
>>
>> When an object is created and later cast to shared, the compiler
>> *can't* know that it should allocate from the shared heap because the
>> cast may not be anywhere near where the object was created. The same
>> problem goes for immutable.
>>
>
> If you meant that the *user* should be responsible for making sure
> it's allocated on the shared heap, then yes, that's possible, but
> you're putting GC implementation details into the type system. That
> may or may not be a good thing.
>

I would phrase it as a shift D's memory model towards NUMA. By the way, GP GPU is here to stay and it's NUMA. HPC software is cache aware, which is NUMA. And all high-end server systems are NUMA aware, to say nothing of cluster/fabric computing.