LLVM and TLS

Sun Feb 22 20:10:28 PST 2015

On Sunday, 22 February 2015 at 17:36:49 UTC, Ola Fosheim Grøstad 
wrote:
> The problem is really in synthetic benchmarks that is comparing 
> apples/oranges. The "problem" may disappear once TLS tables are 
> loaded into the cache or if the compiler has moved the 
> "problem" outside of the loop and retaining it in a register 
> (which also has a hidden cost). A x86 cache miss is  perhaps 
> 100-200 cycles and a 3rd level cache load/full barrier is 30-40 
> cycles, but a pure read or write barrier is only a few 
> cycles... What is the hidden cost of D TLS versus the optimal 
> codegen for a program? I guess you have to compare C vs D on a 
> set of complex programs to figure it all out.

Yes I agree that you can't determine the general performance of 
TLS from such a simple program.

Here's what happened: I was writing a program that could 
optionally use TLS memory.  When I turned on TLS memory it slowed 
down considerably, but only when using an LLVM compiler.  No 
matter how I used TLS, it was much much slower when using LLVM.  
The simple program is just a simple way to demonstrate that TLS 
is very slow in one specific type of program.  It would be great 
to see another program that could demonstrate that TLS is 
actually faster in some use cases.  However, since it it sooo 
much slower, I think you'll have a hard time finding such an 
example.  The simple program demonstrates that TLS is almost 2 
orders of magnitude slower...it may not be that much slower in 
other types of programs...but with numbers like that it seem 
obvious that something is wrong.