LLVM and TLS

Sun Feb 22 09:36:47 PST 2015

On Sunday, 22 February 2015 at 04:33:58 UTC, Dan Olson wrote:
> Hmm, you got me thinking.  A mfence should not be needed for 
> TLS so in a
> MT program, expensive TLS lookup could still win.  If cache is 
> blown,
> wouldn't time to reload cache begin to dominate?  I know all of 
> this is
> very architecture dependent, but I have been wary of the number 
> of
> instructions to do TLS lookup compared to shared.  Perhaps I 
> should not.
> Am I thinking correctly?

The problem is really in synthetic benchmarks that is comparing 
apples/oranges. The "problem" may disappear once TLS tables are 
loaded into the cache or if the compiler has moved the "problem" 
outside of the loop and retaining it in a register (which also 
has a hidden cost). A x86 cache miss is  perhaps 100-200 cycles 
and a 3rd level cache load/full barrier is 30-40 cycles, but a 
pure read or write barrier is only a few cycles... What is the 
hidden cost of D TLS versus the optimal codegen for a program? I 
guess you have to compare C vs D on a set of complex programs to 
figure it all out.