LLVM and TLS
Jonathan Marler via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sun Feb 22 20:10:28 PST 2015
On Sunday, 22 February 2015 at 17:36:49 UTC, Ola Fosheim Grøstad
wrote:
> The problem is really in synthetic benchmarks that is comparing
> apples/oranges. The "problem" may disappear once TLS tables are
> loaded into the cache or if the compiler has moved the
> "problem" outside of the loop and retaining it in a register
> (which also has a hidden cost). A x86 cache miss is perhaps
> 100-200 cycles and a 3rd level cache load/full barrier is 30-40
> cycles, but a pure read or write barrier is only a few
> cycles... What is the hidden cost of D TLS versus the optimal
> codegen for a program? I guess you have to compare C vs D on a
> set of complex programs to figure it all out.
Yes I agree that you can't determine the general performance of
TLS from such a simple program.
Here's what happened: I was writing a program that could
optionally use TLS memory. When I turned on TLS memory it slowed
down considerably, but only when using an LLVM compiler. No
matter how I used TLS, it was much much slower when using LLVM.
The simple program is just a simple way to demonstrate that TLS
is very slow in one specific type of program. It would be great
to see another program that could demonstrate that TLS is
actually faster in some use cases. However, since it it sooo
much slower, I think you'll have a hard time finding such an
example. The simple program demonstrates that TLS is almost 2
orders of magnitude slower...it may not be that much slower in
other types of programs...but with numbers like that it seem
obvious that something is wrong.
More information about the digitalmars-d-ldc
mailing list