Is 2X faster large memcpy interesting?

Fri Mar 27 01:54:35 PDT 2009

Thomas Moran wrote:
> On 26/03/2009 20:08, Don wrote:
>> BTW: I tested the memcpy() code provided in AMD's 1992 optimisation
>> manual, and in Intel's 2007 manual. Only one of them actually gave any
>> benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't
>> Intel!)
>> I've noticed that AMD's docs are usually greatly superior to Intels, but
>> this time the difference is unbelievable.
> 
> Don, have you seen Agner Fog's memcpy() and memmove() implementations 
> included with the most recent versions of his manuals? In the unaligned 
> case they read two XMM words and shift/combine them into the target 
> alignment, so all loads and stores are aligned. Pretty cool.
> 
> He says (modestly):
> 
> ; This method is 2 - 6 times faster than the implementations in the
> ; standard C libraries (MS, Gnu) when src or dest are misaligned.
> ; When src and dest are aligned by 16 (relative to each other) then this
> ; function is only slightly faster than the best standard libraries.

I'm aware of Agner's code (it was a motivation), but I deliberately 
haven't looked at it since it's GPLed.
BTW, I already have copy-with-shifting implemented for the implemenation 
of bigint.