Branch Prediction strange results
bearophile
bearophileHUGS at lycos.com
Wed Nov 12 12:09:37 PST 2008
Don:
> Are you running it on a Pentium 4? Pentium 4 has *horrific* branch
> misprediction (minimum 24 cycles, 45 uops). No other processor is nearly
> as bad, eg it's 15 cycles on Core2; it was just 4 cycles on PMMX.
Sorry, I am using a Core2 @ 2GHz.
The fixed C code with timings:
#include "stdio.h"
//#define FIRST
int main() {
int counter0 = 0, counter1 = 0, counter2 = 0, counter3 = 0;
int i = 300000000;
while (i--) {
#ifdef FIRST
// 0.63 s
if (i % 4 == 0) {
counter0++;
} else if (i % 4 == 1) {
counter1++;
} else if (i % 4 == 2) {
counter2++;
} else {
counter3++;
}
#else
// 0.66 s
if (i & 2) {
if (i & 1) {
counter3++;
} else {
counter2++;
}
} else {
if (i & 1) {
counter1++;
} else {
counter0++;
}
}
#endif
}
printf("%d %d %d %d\n", counter0, counter1, counter2, counter3);
return 0;
}
Fixed D code with timings:
void main() {
int counter0, counter1, counter2, counter3;
int i = 300000000;
while (i--)
static if (0) { // 1.24 s
if (i % 4 == 0) {
counter0++;
} else if (i % 4 == 1) {
counter1++;
} else if (i % 4 == 2) {
counter2++;
} else {
counter3++;
}
} else { // 1.01 s
if (i & 2) {
if (i & 1) {
counter3++;
} else {
counter2++;
}
} else {
if (i & 1) {
counter1++;
} else {
counter0++;
}
}
}
printf("%d %d %d %d\n", counter0, counter1, counter2, counter3);
}
As you can see the C version (GCC 4.2.1-dw2) is twice faster than the D one, and it shows the scan as faster than the binary search, as says the article I have linked.
Bye,
bearophile
More information about the Digitalmars-d-learn
mailing list