Memory Management in D: Request for Comment

Wed Nov 4 15:05:44 PST 2009

Hello dsimcha,

> == Quote from BCS (none at anon.com)'s article
> 
>> Hello dsimcha,
>> 
>>> 3.  This one is an order of magnitude less likely than the other two
>>> to actually get implemented, at least by me, but how about
>>> thread-local allocators so you can call malloc() without taking a
>>> lock?  I vaguely remember Sean saying he was working on that a while
>>> back, but I never heard anything about it again.  It's probably best
>>> to wait for shared to be implemented for this so that unshared
>>> objects can also be collected w/o stopping the world, but we should
>>> start at least discussing this now.
>>> 
>> I think there are general malloc functions that are lock free. On the
>> same note, I've been toying with an idea for how to do memory heap
>> structure that would be totally lock free (think CAS) but has some 
>> downsides (like it won't handle a non-continues memory space and
>> expanding the memory could get very expensive. Both of these could
>> be non issues in some cases; short lived job specific heaps, memory
>> mapped (shared?) data(base) files, resource limited systems, etc.
>> I've been meaning to get around to implementing it but never had
>> time. I'd be willing to collaborate or, if even that doesn't get me
>> moving, just hand over the whole project in exchange for a 
>> "coauthor" line in the comments.
>
> But x86 atomic ops, while not as slow as locks, are pretty slow.  Some
> benchmarks I did a while back indicated that, on Windows, atomic
> increment is only about twice as fast as an uncontested synchronized {
> var++; }.

IIRC a mutex/lock has to have a CAS at some point so if you can get a function 
to work with a single CAS, at worst an uncontested action will be as good 
as an uncontested mutex/lock. Once uncontention starts happening it depends 
on how much your retry costs. Another thing to take into account is the overhead 
vs granularity tradeoff you get with locks that can sometimes be skiped with 
direct use of CAS.

> I think using thread-local storage and having each
> thread-local heap interact with the global heap only occasionally, for
> very large transactions, is a much better bet than using atomic CAS.

I got started thinking about this when I came across a comment that some 
particular malloc didn't have issues with freeing memory from a different 
thread than it was malloced from. 

> 
> I'm also a little bit confused about what you're suggesting,
> specifically about the non-continuous memory space.  Does this mean
> that things would have to be freed LIFO?  If so, what use cases would
> you be covering that TempAlloc doesn't already cover?  If not, what
> does it mean?

IIRC, when the current GC runs out of space, it uses mmap (on linux) to get 
more address space mapped to something. This wouldn't work for my idea because 
you can't count of mmap returning a block of ram at any given location, it 
is free to leave gaping holes in the places you can allocate at and I wouldn't 
be able to handle that.