String compare performance

Pelle Månsson pelle.mansson at gmail.com
Sun Nov 28 09:59:21 PST 2010


On 11/28/2010 04:48 PM, bearophile wrote:
> Pelle Månsson:
>
>> I don't have your data set,
>
> In my first post I have explained how to create the data set, using a little Python script that I have listed there. You just need a Python2 interpreter.
>
>
>> but for me using random data this was within a factor 2 of your #3, without any fiddly code.
>
> Thank you for your code. I have added your version, plus a modified version that works on ubytes, because it's faster, #9 and #10:
>
> -------------------------
>
> // D version #9
> import std.file: read;
> import std.c.stdio: printf;
> import std.algorithm: find;
> import std.array: empty, popFront;
>
> int test(char[] data) {
>      int count;
>      while (true) {
>          data = data.find("TAG", "TGA", "TAA")[0];
>          if (data.empty)
>              return count;
>           count++;
>           data.popFront();
>       }
> }
>
> void main() {
>      char[] data0 = cast(char[])read("data.txt");
>      int n = 300;
>      char[] data = new char[data0.length * n];
>      for (size_t pos; pos<  data.length; pos += data0.length)
>          data[pos .. pos + data0.length] = data0;
>
>      printf("%d\n", test(data));
> }
>
> -------------------------
>
> // D version #10
> import std.file: read;
> import std.c.stdio: printf;
> import std.algorithm: find;
> import std.array: empty, popFront;
>
> int test(ubyte[] data) {
>      int count;
>      while (true) {
>          data = data.find("TAG", "TGA", "TAA")[0];
>          if (data.empty)
>              return count;
>           count++;
>           data.popFront();
>       }
> }
>
> void main() {
>      ubyte[] data0 = cast(ubyte[])read("data.txt");
>      int n = 300;
>      ubyte[] data = new ubyte[data0.length * n];
>      for (size_t pos; pos<  data.length; pos += data0.length)
>          data[pos .. pos + data0.length] = data0;
>
>      printf("%d\n", test(data));
> }
>
> -------------------------
>
> Timings, dmd compiler, best of 4, seconds:
>    D #1:  5.72
>    D #9:  5.04
>    D #10: 3.31
>    D #4:  1.84
>    D #5:  1.73
>    Psy:   1.59
>    D #8:  1.51
>    D #7:  0.56
>    D #2:  0.55
>    D #6:  0.47
>    D #3:  0.34
>
>
> So on this PC it's not much fast. And generally I don't like to use find, empty and popFront to solve this problem, it's quite unnatural. Here I have explained what I think is a good enough solution to this performance problem:
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=123044
>
> Bye,
> bearophile

Measuring incorrectly in a performance test is a silly mistake, I was 
indeed wrong about the factor two thing!

I thought find was specialized for chars so it wouldn't require casting 
to ubyte. Maybe I was wrong again.

Thank you for your engaging performance oriented posts. :-)


More information about the Digitalmars-d mailing list