Thread local and memory allocation

Mon Oct 3 13:54:22 PDT 2011

On Oct 3, 2011, at 12:48 PM, deadalnix wrote:

> D's uses thread local storage for most of its data. And it's a good thing.
> 
> However, the allocation mecanism isn't aware of it. In addition, it has no way to handle it in the future as things are specified.
> 
> As long as you don't have any pointer in shared memory to thread local data (thank to the type system) so this is something GC could use at his own advantage.
> 
> As long as good pratice should minimize as much as possible the usage of shared data, this design choice make things worse for good design, which is, IMO, not D's phylosophy.
> 
> The advantages of handling this at memory management levels are the followings :
> - Swap friendlyness. Data of a given thread can be located in blocks, so an idle thread can be swapped easily without huge penality on performance. Anyone who have used chrome and firefox with a lots of tabs on a machine with limited memory know what I'm talking about : firefox uses less memory than whrome, but performance are terrible anyway, because chrome memory layout is more cache friendly (tabs memory isn't mixed with each others).
> - Effisciency in heavily multithreaded application like servers : the more thread run in the program, the more a stop the world GC is costly. As long as good design imply separate data from thread as much as possible, a thread local collection can be triggered at time without stopping other threads.
> 
> Even is thoses improvements are not implemented yet and anytime soon, it kinda sad that the current interface doesn't allow for this.
> 
> What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on.
> 
> What do you think ? Is it something it worth working on ? If it is, how can I help ?

There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it.  So that SHARED flag should really be a thread-id of some sort.  Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool.  This would allow the lock granularity to be reduced and in some cases eliminated.

I'd like to move to CDGC as an intermediate step, and that will need some testing and polish.  That would allow for precise collections if the compiler support is added.  Then the thread-local finalization has to be tackled one way or another.  I'd favor per-thread heaps but am open to suggestions and/or help.