String compare performance

Sat Nov 27 19:08:29 PST 2010

I have done another test:

Timings, dmd compiler, best of 4, seconds:
  D #1: 5.72
  D #4: 1.84
  D #5: 1.73
  Psy:  1.59
  D #2: 0.55
  D #6: 0.47  
  D #3: 0.34

import std.file: read;
import std.c.stdio: printf;

int test(char[] data) {
    int count;
    foreach (i; 0 ..  data.length - 3) {
        char[] codon = data[i .. i + 3];
        if ((codon.length == 3 && codon[0] == 'T' && codon[1] == 'A' && codon[2] == 'G') ||
            (codon.length == 3 && codon[0] == 'T' && codon[1] == 'G' && codon[2] == 'A') ||
            (codon.length == 3 && codon[0] == 'T' && codon[1] == 'A' && codon[2] == 'A'))
            count++;
    }
    return count;
}

void main() {
    char[] data0 = cast(char[])read("data.txt");
    int n = 300;
    char[] data = new char[data0.length * n];
    for (size_t pos; pos < data.length; pos += data0.length)
        data[pos .. pos+data0.length] = data0;

    printf("%d\n", test(data));
}

So when there is to compare among strings known at compile-time to be small (like < 6 char), the comparison shall be replaced with inlined single char comparisons. This makes the code longer so it increases code cache pressure, but seeing how much slow the alternative is, I think it's an improvement.

(A smart compiler is even able to remove the codon.length==3 test because the slice data[i..i+3] is always of length 3 (here mysteriously if you remove those three length tests the program compiled with dmd gets slower)).

Bye,
bearophile