Is 2X faster large memcpy interesting?
JC
jcrapuchettes at gmail.com
Fri Mar 27 09:11:18 PDT 2009
The applications that I write usually work with matrices of size 600x600 up to
2000x2000 and since they are doubles, that is a good chunk of memory.
Unleash the optimizations!
JC
Don wrote:
> The next D2 runtime will include my cache-size detection code. This
> makes it possible to write a cache-aware memcpy, using (for example)
> non-temporal writes when the arrays being copied exceed the size of the
> largest cache.
> In my tests, it gives a speed-up of approximately 2X in such cases.
> The downside is, it's a fair bit of work to implement, and it only
> affects extremely large arrays, so I fear it's basically irrelevant (It
> probably won't help arrays < 32K in size). Do people actually copy
> megabyte-sized arrays?
> Is it worth spending any more time on it?
>
>
> BTW: I tested the memcpy() code provided in AMD's 1992 optimisation
> manual, and in Intel's 2007 manual. Only one of them actually gave any
> benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't
> Intel!)
> I've noticed that AMD's docs are usually greatly superior to Intels, but
> this time the difference is unbelievable.
More information about the Digitalmars-d
mailing list