Ideas from the Chapel language (swap)

Wed Oct 3 21:39:25 PDT 2007

On Wed, Oct 03, 2007 at 06:25:08PM -0700, Gregor Richards wrote:

> For the ridiculously-insane:
>
> void swap(T)(ref T a, ref T b)
> {
>     synchronized {
>         // this should be some kind of static for ...
>         for (size_t i = 0; i < (a.sizeof/size_t.sizeof); i++) {
>             (cast(size_t*) &a)[i] ^= (cast(size_t*) &b)[i];
>             (cast(size_t*) &b)[i] = (cast(size_t*) &a)[i] ^ (cast(size_t*) 
> &b)[i];
>             (cast(size_t*) &a)[i] ^= (cast(size_t*) &b)[i];
>         }
>     }
> }
>
> Add some loop unrolling and that's more efficient than memcpy :P

And rather devestating if another thread causes a garbage collection.
Doesn't the synchronized just lock this section of code, not the whole
program?

But, why would it be any more efficient than the inside of the loop just
saying:

   size_t tmp = ((cast(size_t*) &a)[i];
   ((cast(size_t*) &a)[i] = ((cast(size_t*) &b)[i];
   ((cast(size_t*) &b)[i] = tmp;

This does 2 reads and 2 writes.  The xor version does 4 reads and 3 writes.
How could it be faster?

The xor trick really only helps you if you are one architecture where you
don't have a register to spare.  Shuffling through a register is going to
be much faster than multiple reads and writes to memory.

David