Is 2X faster large memcpy interesting?

Thu Mar 26 14:34:15 PDT 2009

Don wrote:
> The next D2 runtime will include my cache-size detection code. This 
> makes it possible to write a cache-aware memcpy, using (for example) 
> non-temporal writes when the arrays being copied exceed the size of the 
> largest cache.
> In my tests, it gives a speed-up of approximately 2X in such cases.
> The downside is, it's a fair bit of work to implement, and it only 
> affects extremely large arrays, so I fear it's basically irrelevant (It 
> probably won't help arrays < 32K in size). Do people actually copy 
> megabyte-sized arrays?
> Is it worth spending any more time on it?

I'd think so. In this day and age it is appalling that we don't quite 
know how to quickly copy memory around. A long time ago I ran some 
measurements (http://www.ddj.com/cpp/184403799) and I was quite 
surprised. My musings were as true then as now. And now we're getting to 
the second freakin' Space Odyssey!

===============
Things are clearly hazy, aren't they? First off, maybe it came as a 
surprise to you that there's more than one way to fill and copy objects. 
Then, there's no single variant of fill and copy that works best on all 
compilers, data sets, and machines. (I guess if I tested the same code 
on a Celeron, which has less cache, I would have gotten very different 
results. To say nothing about other architectures.)

As a rule of thumb, it's generally good to use memcpy (and consequently 
fill-by-copy) if you can — for large data sets, memcpy doesn't make much 
difference, and for smaller data sets, it might be much faster. For 
cheap-to-copy objects, Duff's Device might perform faster than a simple 
for loop. Ultimately, all this is subject to your compiler's and 
machine's whims and quirks.

There is a very deep, and sad, realization underlying all this. We are 
in 2001, the year of the Spatial Odyssey. We've done electronic 
computing for more than 50 years now, and we strive to design more and 
more complex systems, with unsatisfactory results. Software development 
is messy. Could it be because the fundamental tools and means we use are 
low-level, inefficient, and not standardized? Just step out of the box 
and look at us — after 50 years, we're still not terribly good at 
filling and copying memory.
================

Andrei