Branch Prediction strange results

bearophile bearophileHUGS at lycos.com
Wed Nov 12 12:09:37 PST 2008


Don:
> Are you running it on a Pentium 4? Pentium 4 has *horrific* branch 
> misprediction (minimum 24 cycles, 45 uops). No other processor is nearly 
> as bad, eg it's 15 cycles on Core2; it was just 4 cycles on PMMX.

Sorry, I am using a Core2 @ 2GHz.
The fixed C code with timings:

#include "stdio.h"

//#define FIRST

int main() {
    int counter0 = 0, counter1 = 0, counter2 = 0, counter3 = 0;
    int i = 300000000;
    while (i--) {
        #ifdef FIRST
            // 0.63 s
            if (i % 4 == 0) {
                counter0++;
            } else if (i % 4 == 1) {
                counter1++;
            } else if (i % 4 == 2) {
                counter2++;
            } else {
                counter3++;
            }
        #else
            // 0.66 s
            if (i & 2) {
                if (i & 1) {
                    counter3++;
                } else {
                    counter2++;
                }
            } else {
                if (i & 1) {
                    counter1++;
                } else {
                    counter0++;
                }
            }
        #endif
    }

    printf("%d %d %d %d\n", counter0, counter1, counter2, counter3);
    return 0;
}


Fixed D code with timings:

void main() {
    int counter0, counter1, counter2, counter3;

    int i = 300000000;
    while (i--)
        static if (0) { // 1.24 s
            if (i % 4 == 0) {
                counter0++;
            } else if (i % 4 == 1) {
                counter1++;
            } else if (i % 4 == 2) {
                counter2++;
            } else {
                counter3++;
            }
        } else { // 1.01 s
            if (i & 2) {
                if (i & 1) {
                    counter3++;
                } else {
                    counter2++;
                }
            } else {
                if (i & 1) {
                    counter1++;
                } else {
                    counter0++;
                }
            }
        }

    printf("%d %d %d %d\n", counter0, counter1, counter2, counter3);
}

As you can see the C version (GCC 4.2.1-dw2) is twice faster than the D one, and it shows the scan as faster than the binary search, as says the article I have linked.

Bye,
bearophile


More information about the Digitalmars-d-learn mailing list