Supporting emulated tls

Sun Mar 18 11:39:04 PDT 2012

Am Sun, 18 Mar 2012 12:21:51 +0000
schrieb Iain Buclaw <ibuclaw at ubuntu.com>:

> On 18 March 2012 11:32, Johannes Pfau <nospam at example.com> wrote:
> > I thought about supporting emulated tls a little. The GCC emutls.c
> > implementation currently can't work with the gc, as every TLS
> > variable is allocated individually and therefore we don't have a
> > contiguous memory region for the gc. I think these are the possible
> > solutions:
> >
> > * Try to fix GCCs emutls to allocate all tls memory for a module
> >  (application/shared object) at once. That's the best solution
> >  and native TLS works this way, but I'm not sure if we can extract
> >  enough information from the runtime linker to make this work (we
> >  need at least the combined size of all tls variables).
> >
> > * Provide a callback in GCC's emutls which is called after every
> >  allocation. This could call GC.addRange for every variable, but I
> >  guess adding huge amounts of ranges is slow.
> >
> 
> Painfully slow.
> 
> 
> > * Make it possible to register a custom allocator for GCC's emutls
> > (not sure if possible, as this would have to be set up very early in
> >  application startup). Then allocate the memory directly from the GC
> >  (but this memory should only be scanned, not collected)
> >
> > * Replace the calls to mallloc in emutls.c with a custom, region
> > based memory allocator. (This is not a perfect solution though, it
> > can always happen that we'll need more memory)
> >
> >
> >
> > * Do not use GCC's emutls at all, roll a custom solution. This
> > could be compatible with / based on dmd's tls emulation for OSX.
> > Most of the implementation is in core.thread, all that's necessary
> > is to group the tls data into a _tls_data_array and call
> > ___tls_get_addr for every tls access. I'm not sure if this can be
> > done in the 'middle-end' though and it doesn't support shared
> > libraries yet.
> >
> 
> If we are going to fix TLS, I'd rather it be in the most platform
> agnostic way possible, if it could be helped. That would mean also
> scrapping the current implementation on Linux (just tries to mimic
> what dmd does, and has corner cases where it doesn't always get it
> right).

You mean getting rid of __tls_beg and __tls_end? I'd also like to
remove those, but:

TLS is mostly object-format specific (not as much OS specific). The ELF
implementation lays out the TLS data for a module (module = shared
library or the application) in a contiguous way. The details are
described in "ELF Handling For Thread-Local
Storage" (www.akkadia.org/drepper/tls.pdf).

The GC requires the TLS blocks to be contiguous, this is not the case
for GCC's emulated TLS and this causes issues there.

For native TLS/ELF this requirement is met, but the GC also has to know
the start and the size of the TLS sections. Although the runtime
linker has this information, there's no standard way to access it. So
we could:

* Add a custom extension API to the C libraries. We'd need at least: A
  'tls_range dl_get_tls_range(void *handle)' function related to the
  dl* set of funtions in the runtime linker, and a 'tls_range
  dl_get_tls_range2(struct dl_phdr_info *info)' to be used with
  dl_iterate_phdr. We also need some way to get the tls range for the
  application, 'get_app_tls_range' (although some libcs also return
  the application module in dl_iterate_phdr).

This seems to be the best way, but we'd have to patch every C library
and it would take some time till those updated C libraries are widely
deployed.

The other solution is to hook directly into each C libraries non-public
(and maybe non-stable!) API. For example, the structure returned by BSD
libc's dl_iterate_phdr and dlopen has these fields:

 int tlsindex;		/* Index in DTV for this module
 void *tlsinit;		/* Base address of TLS init block
 size_t tlsinitsize;	/* Size of TLS init block for this module
 size_t tlssize;	/* Size of TLS block for this module
 size_t tlsoffset;	/* Offset of static TLS block for this module 
 size_t tlsalign;	/* Alignment of static TLS block

tlsindex gives us the start-address of the TLS for every thread, as
long as we know how to compute the TLS address from the TP (thread
pointer) and the dtv index (there are basically 2 methods, described in
"ELF Handling For Thread-Local Storage") and tlssize gives us the size.

However, there doesn't seem to be a painless way to do this...