Is 2X faster large memcpy interesting?
Don
nospam at nospam.com
Fri Mar 27 01:54:35 PDT 2009
Thomas Moran wrote:
> On 26/03/2009 20:08, Don wrote:
>> BTW: I tested the memcpy() code provided in AMD's 1992 optimisation
>> manual, and in Intel's 2007 manual. Only one of them actually gave any
>> benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't
>> Intel!)
>> I've noticed that AMD's docs are usually greatly superior to Intels, but
>> this time the difference is unbelievable.
>
> Don, have you seen Agner Fog's memcpy() and memmove() implementations
> included with the most recent versions of his manuals? In the unaligned
> case they read two XMM words and shift/combine them into the target
> alignment, so all loads and stores are aligned. Pretty cool.
>
> He says (modestly):
>
> ; This method is 2 - 6 times faster than the implementations in the
> ; standard C libraries (MS, Gnu) when src or dest are misaligned.
> ; When src and dest are aligned by 16 (relative to each other) then this
> ; function is only slightly faster than the best standard libraries.
I'm aware of Agner's code (it was a motivation), but I deliberately
haven't looked at it since it's GPLed.
BTW, I already have copy-with-shifting implemented for the implemenation
of bigint.
More information about the Digitalmars-d
mailing list