Replacing C's memcpy with a D implementation
Walter Bright
newshound2 at digitalmars.com
Sun Jun 10 22:13:13 UTC 2018
On 6/10/2018 5:49 AM, Mike Franklin wrote:
> [...]
One source of entropy in the results is src and dst being global variables.
Global variables in D are in TLS, and TLS access can be complex (many
instructions) and is influenced by the -fPIC switch. Worse, global variable
access is not optimized in dmd because of aliasing problems.
The solution is to pass src, dst, and length to the copy function as function
parameters (and make sure function inlining is off).
In light of this, I want to BEAT THE DEAD HORSE once again and assert that if
the assembler generated by a benchmark is not examined, the results can be
severely misleading. I've seen this happen again and again. In this case, TLS
access is likely being benchmarked, not memcpy.
BTW, the relative timing of rep movsb can be highly dependent on which CPU chip
you're using.
More information about the Digitalmars-d
mailing list