Why are globals set to tls by default? and why is fast code ugly by default?

IGotD- nise at nise.com
Sat Apr 1 21:27:02 UTC 2023


On Saturday, 1 April 2023 at 15:02:12 UTC, Ali Çehreli wrote:
>
> Does anyone have documentation on why Rust and Zip does not do 
> thread local by default? I wonder what experience it was based 
> on.
>

I think that would hard to get documentation on the rationale for 
that decision. Maybe you can get an answer in their forums but I 
doubt it. For Rust I think they based it on that globals should 
have some kind of synchronization which is enforced at compile 
time. Therefore TLS becomes second citizen.

> Speaking of experience, I used to be a C++ programmer. We made 
> use of thread-local storage precisely zero times. I think it's 
> because the luminaries of the time did not even talk about it.
>

Yes, that's "normal" programming that you more or less never use 
TLS.

> With D, I take good advantage of thread-local storage. 
> Interestingly, I do that *only* for fast code.
>
> void foo(int arg) {
>     static int[] workArea;
>
>     if (workArea.length < nededFor(arg)) {
>         // increase length
>     }
>
>     // Use workArea
> }
>
> Now I can use any number of threads using foo and they will 
> have their independent work areas. Work area grows in amortized 
> fashion for each thread.
>
> I find the code above to be clean and beautiful. It is very 
> fast because there are no synchronization primitives needed 
> because no work area is shared between threads.
>

There is nothing beautiful with it other than the clean syntax. 
Why not just use a stack variable which is thread local as well. 
TLS is often allocated on the stack in many systems anyway. 
Accessing TLS variables can slower compared to stack variables. 
The complexity of TLS doesn't pay for its usefulness.

>
> > It's common knowledge that accessing tls global is slow
> > 
> http://david-grs.github.io/tls_performance_overhead_cost_linux/
>
> "TLS global is slow" would be misleading because even the 
> article you linked explains right at the top, in the TL;DR are 
> that "TLS may be slow".

This depends how it is implemented. TLS is really a forest and 
can be implemented in many ways and it also depends where it is 
being accessed (shared libraries, executable etc.). In general 
TLS on x86 is accessed by fs:[-offset_to_variable] this isn't 
that slow but the complexity to get there is high. Keep in mind 
the TLS area must be initialized for every thread creation which 
isn't ideal. fs:[] isn't always possible and a function call is 
required similar to a DLL symbol look up. TLS is a turd which 
shouldn't have been created. They should have stopped with 
key/value pair which languages then could build on if they 
wanted. Now TLS are in the executable standards and it is a mess. 
x86 has now two ways of TLS (normal and TLS_DESC) just to make 
things even more complicated. A programmer never see this mess 
but as systems programmer I see this and it is horrible.




More information about the Digitalmars-d-learn mailing list