Is 2X faster large memcpy interesting?

Georg Wrede georg.wrede at iki.fi
Thu Mar 26 13:49:09 PDT 2009


Don wrote:
> The next D2 runtime will include my cache-size detection code. This 
> makes it possible to write a cache-aware memcpy, using (for example) 
> non-temporal writes when the arrays being copied exceed the size of the 
> largest cache.
> In my tests, it gives a speed-up of approximately 2X in such cases.
> The downside is, it's a fair bit of work to implement, and it only 
> affects extremely large arrays, so I fear it's basically irrelevant (It 
> probably won't help arrays < 32K in size). Do people actually copy 
> megabyte-sized arrays?
> Is it worth spending any more time on it?
> 
> 
> BTW: I tested the memcpy() code provided in AMD's 1992 optimisation 
> manual, and in Intel's 2007 manual. Only one of them actually gave any 
> benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't 
> Intel!)
> I've noticed that AMD's docs are usually greatly superior to Intels, but 
> this time the difference is unbelievable.

What's the alternative? What would you do instead? Is there something 
cooler or more important for D to do?

(IMHO, if the other alternatives have any merit, then I'd vote for them.)

But then again, you've already invested in this, and it clearly 
interests you. Labourious, yes, but it sounds fun.



More information about the Digitalmars-d mailing list