Is 2X faster large memcpy interesting?

Thu Mar 26 16:33:47 PDT 2009

On 26/03/2009 20:08, Don wrote:
> BTW: I tested the memcpy() code provided in AMD's 1992 optimisation
> manual, and in Intel's 2007 manual. Only one of them actually gave any
> benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't
> Intel!)
> I've noticed that AMD's docs are usually greatly superior to Intels, but
> this time the difference is unbelievable.

Don, have you seen Agner Fog's memcpy() and memmove() implementations 
included with the most recent versions of his manuals? In the unaligned 
case they read two XMM words and shift/combine them into the target 
alignment, so all loads and stores are aligned. Pretty cool.

He says (modestly):

; This method is 2 - 6 times faster than the implementations in the
; standard C libraries (MS, Gnu) when src or dest are misaligned.
; When src and dest are aligned by 16 (relative to each other) then this
; function is only slightly faster than the best standard libraries.