OS X libphobos2.so

Sun Nov 8 10:12:02 PST 2015

On Saturday, 7 November 2015 at 08:37:40 UTC, Jacob Carlborg 
wrote:
> On 2015-11-06 19:46, bitwise wrote:
>
>> Currently, the compiler just calls ___tls_get_addr(void *p) to 
>> get the
>> thread local copy of a global. If that function signature is 
>> altered to
>> take a pointer to the image as well, the problem is solved.
>
> Hehe, you make it sound so easy. Perhaps I missed something and 
> you know more than I do. But as far as I know you have two 
> options:
>
> 1. Implement native TLS. This will require modifications to the 
> compiler and minor tweaks in the runtime
>
> 2. Continue to use the custom TLS implementation but add 
> support for dynamic libraries. This will require modifications 
> to the compiler (as you said above) and major changes to the 
> runtime
>
> The native TLS implementation works as you described above 
> (roughly). I can hardly believe that the code Apple added to 
> the dynamic linker to implement TLS is not necessary. I don't 
> see how you can get around not implementing the same code as 
> the dynamic linker does.
>
> I also think that this is a good opportunity to change to 
> native TLS. I don't like this situation we have now: "Yeah, D 
> is compatible with C, except TLS on OS X.".

Well, I'm speaking in relative terms when I say easy... ;)

Right now, TLS has a fairly simple implementation. DMD puts any 
global TLS vars into their own section in the binary. Then, at 
the point here those vars are accessed in code, DMD inserts a 
call to ___tls_get_addr(void*) to map the address of the var to 
some thread specific block of memory. When ___tls_get_addr() is 
called, it lazily instantiates a block of memory for the calling 
thread, memcpy's the TLS vars from the TLS section in the binary, 
and stores that thread local copy using pthread_set_specific(). 
Any subsequent calls to ___tls_get_addr() will simply use 
pthread_get_specific() to retrieve that block of memory, and map 
the received address to one pointing in that block.

So, since binaries will not be mapped to overlapping address 
spaces, I can loop over all the binary images and find the range 
to which the argument of ___tls_get_addr() belongs, and map the 
pointer to the appropriate block of memory.

I am concerned that looping over all binary images for each TLS 
access will have performance implications, but for now, this 
solution is good enough. Later, ___tls_get_addr() can be amended 
to pass a pointer to the image from which the TLS originated, 
allowing constant time lookup. I believe Martin has already done 
this for linux/fbsd, but I had time to look at this specific 
issue.

So.. I've got a basic implementation working at this point. The 
global ctors are now used instead of that infernal dyld callback 
to initialize sections. I've tried loading(dynamically) a shared 
library, and everything seems to work. Next on the list is to 
work on how all this interacts with threads. Martin seems to have 
already solved this too, so it should be fairly straight forward. 
Currently, linking a dylib statically throws "thread.d(2916): 
Unable to suspend thread", but other wise, seems to work as 
expected.

Anyways, I am open to any help on the TLS stuff if you've got 
time.

      Bit