LLVM and TLS
via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sun Feb 22 09:36:47 PST 2015
On Sunday, 22 February 2015 at 04:33:58 UTC, Dan Olson wrote:
> Hmm, you got me thinking. A mfence should not be needed for
> TLS so in a
> MT program, expensive TLS lookup could still win. If cache is
> blown,
> wouldn't time to reload cache begin to dominate? I know all of
> this is
> very architecture dependent, but I have been wary of the number
> of
> instructions to do TLS lookup compared to shared. Perhaps I
> should not.
> Am I thinking correctly?
The problem is really in synthetic benchmarks that is comparing
apples/oranges. The "problem" may disappear once TLS tables are
loaded into the cache or if the compiler has moved the "problem"
outside of the loop and retaining it in a register (which also
has a hidden cost). A x86 cache miss is perhaps 100-200 cycles
and a 3rd level cache load/full barrier is 30-40 cycles, but a
pure read or write barrier is only a few cycles... What is the
hidden cost of D TLS versus the optimal codegen for a program? I
guess you have to compare C vs D on a set of complex programs to
figure it all out.
More information about the digitalmars-d-ldc
mailing list