String compare performance

bearophile bearophileHUGS at lycos.com
Sun Nov 28 07:48:46 PST 2010


Pelle Månsson:

> I don't have your data set,

In my first post I have explained how to create the data set, using a little Python script that I have listed there. You just need a Python2 interpreter.


> but for me using random data this was within a factor 2 of your #3, without any fiddly code.

Thank you for your code. I have added your version, plus a modified version that works on ubytes, because it's faster, #9 and #10:

-------------------------

// D version #9
import std.file: read;
import std.c.stdio: printf;
import std.algorithm: find;
import std.array: empty, popFront;

int test(char[] data) {
    int count;
    while (true) {
        data = data.find("TAG", "TGA", "TAA")[0];
        if (data.empty)
            return count;
         count++;
         data.popFront();
     }
}

void main() {
    char[] data0 = cast(char[])read("data.txt");
    int n = 300;
    char[] data = new char[data0.length * n];
    for (size_t pos; pos < data.length; pos += data0.length)
        data[pos .. pos + data0.length] = data0;

    printf("%d\n", test(data));
}

-------------------------

// D version #10
import std.file: read;
import std.c.stdio: printf;
import std.algorithm: find;
import std.array: empty, popFront;

int test(ubyte[] data) {
    int count;
    while (true) {
        data = data.find("TAG", "TGA", "TAA")[0];
        if (data.empty)
            return count;
         count++;
         data.popFront();
     }
}

void main() {
    ubyte[] data0 = cast(ubyte[])read("data.txt");
    int n = 300;
    ubyte[] data = new ubyte[data0.length * n];
    for (size_t pos; pos < data.length; pos += data0.length)
        data[pos .. pos + data0.length] = data0;

    printf("%d\n", test(data));
}

-------------------------

Timings, dmd compiler, best of 4, seconds:
  D #1:  5.72
  D #9:  5.04
  D #10: 3.31
  D #4:  1.84
  D #5:  1.73
  Psy:   1.59
  D #8:  1.51
  D #7:  0.56
  D #2:  0.55
  D #6:  0.47  
  D #3:  0.34


So on this PC it's not much fast. And generally I don't like to use find, empty and popFront to solve this problem, it's quite unnatural. Here I have explained what I think is a good enough solution to this performance problem:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=123044

Bye,
bearophile


More information about the Digitalmars-d mailing list