Replacing C's memcpy with a D implementation

Walter Bright newshound2 at digitalmars.com
Sun Jun 10 22:13:13 UTC 2018


On 6/10/2018 5:49 AM, Mike Franklin wrote:
> [...]

One source of entropy in the results is src and dst being global variables. 
Global variables in D are in TLS, and TLS access can be complex (many 
instructions) and is influenced by the -fPIC switch. Worse, global variable 
access is not optimized in dmd because of aliasing problems.

The solution is to pass src, dst, and length to the copy function as function 
parameters (and make sure function inlining is off).

In light of this, I want to BEAT THE DEAD HORSE once again and assert that if 
the assembler generated by a benchmark is not examined, the results can be 
severely misleading. I've seen this happen again and again. In this case, TLS 
access is likely being benchmarked, not memcpy.

BTW, the relative timing of rep movsb can be highly dependent on which CPU chip 
you're using.


More information about the Digitalmars-d mailing list