X[]==Y[] is 7X slower than it should be -- why?

Fri Jun 20 08:02:25 PDT 2008

I've been doing some timing tests, and found that the built-in array 
comparison is amazingly slow.

Given:
uint [2000] X, Y;

bool b = (X[] == Y[]);

on my Pentium M, this takes ~30000 cycles in the case where both arrays 
are equal.
It's not hard to write an optimal asm routine; this takes 4000 cycles 
(7.5 times faster).
But, it's also possible to perform the comparison in simple D code, 
something like:

int compareArrays(uint [] x, uint y[])
{
     for (int i=0; i<x.length; ++i) {
         if (x[i] != y[i]) return x[i]-y[i];
     }
     return 0;
}

and this takes 6000 cycles when compiled with -release -O, 5X better 
than the built-in version.

So, why is the built-in compare so slow? Using the CPU performance counters:
X[]==Y[] --> 10000 branches, 35000 memory loads, 30000 cycles.
for loop --> 4000 branches, 6000 memory loads, 6000 cycles.
asm      --> 4000 branches, 4000 memory loads, 4000 cycles.

What's it doing? Is it doing a byte-by-byte comparison? (it shouldn't do 
that even for strings). Is it doing bounds checking for every array 
element?? (even with -release -O)

There's significant optimisation potential here.