Sneak preview into std.allocator's porcelain

deadalnix via Digitalmars-d digitalmars-d at puremagic.com
Sun May 10 17:06:38 PDT 2015


On Sunday, 10 May 2015 at 16:56:27 UTC, Jacob Carlborg wrote:
> On 2015-05-08 21:55, Andrei Alexandrescu wrote:
>
>> a few measurements would be in order. -- Andrei
>
> Be sure you do that on more than one platform. For example, the 
> emulate TLS on OS X can be quite slow, I've heard.

I was trying to come up with a good benchmark for TLS, but it is 
remarkably difficult.

Usually, you have one TLS segment per linker module (meaning one 
for your app + one per shared object). You have once segment that 
is kept around by the compiler to be used.

Once you access TLS in your code, things goes as follow:
1/ The compiler know you have the right segment around and so 
segment lookup needs to take place.
2/ The compiler don't know it, but you have the right segment. In 
which case you do a round trip in the runtime, but take the fast 
path.
3/ You have the wrong segment, in which case the runtime have to 
figure out what is the right segment, and that is slow and often 
imply locks, and even, in worst case scenarii, round trip to the 
OS.

A good benchmark must have TLS accessed from the application and 
from some shared object, be big enough so the compiler do not see 
through all these access (or is will simply keep both segment 
around which it won't do by default, but will if necessity is 
apparent), and have a realistic access pattern (it is fairly easy 
to trash the perfs by doing ping pong between the 2 TLS segment, 
but it is probably not very realistic).

Long story short, I'm worried by this TLS issue, but I'd welcome 
more data.


More information about the Digitalmars-d mailing list