memset and related things

Don nospam at nospam.com
Sun Sep 20 11:18:13 PDT 2009


bearophile wrote:
> In a program I've seen that in the inner loop an array cleaning was taking too much time. To solve the problem I've done many experiments, and I've also produced the following testing program.
> 
> The short summary is, to set array of 4 byte integers to a certain constant the best was are:
> - if len <~ 20, then just use an inlined loop.
> - if 20 < len < 200_000 it's better to use a loop unrolled 4 times with the movaps instruction (8 times unrolled is a little worse).
> - if n > 200_000 a loop with the movntps instruction is better.
> 
> Generally such solutions are better than the memset() (only when len is about 150_000 memset is a bit better than four movaps).

Yeah, DMD's memset() and memcpy() are far from optimal. IIRC memcpy() is 
even worse. I had done a bit of work on it, as well, but when I posted 
preliminary stuff, there wasn't much interest. The general feedback 
seemed to be that it'd be more useful to fix the compiler ICE bugs. So I 
did that <g>. It'll be interesting to see what the priorities are now -- 
maybe this stuff is of more interest now.

BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
exactly remember) goes into great detail about both memcpy() and 
memset(). Turns out there's about five different cases.



More information about the Digitalmars-d mailing list