String compare performance

spir denis.spir at gmail.com
Sat Nov 27 14:04:29 PST 2010


On Sat, 27 Nov 2010 11:53:06 -0500
bearophile <bearophileHUGS at lycos.com> wrote:

> Timings, dmd compiler, best of 4, seconds:
>   D #1: 5.72
>   Psy:  1.59
>   D #2: 0.55
>   D #3: 0.34
> 
> 
> 2.3 GHz CPU.
> dmd 2.050, ldc based on DMD v1.057 and llvm 2.6, gdc head based on GCC 4.4.5.
> 
> --------------------------
> 
> # Python2 version
> import psyco
> 
> def test(data):
>     count = 0
>     for i in xrange(len(data) - 3):
>         codon = data[i : i + 3]
>         if codon == "TAG" or codon == "TGA" or codon == "TAA":
>             count += 1
>     return count
> 
> def main():
>     data = open("data.txt").read()
>     data = data * 300
>     print test(data)
> 
> psyco.full()
> main()
> 
> --------------------------
> 
> To produce some test data:
> 
> 
> # Python2 code
> from itertools import islice
> 
> def rnd(ia = 3877, ic = 29573, im = 139968):
>     seed = 42
>     imf = float(im)
>     while 1:
>         seed = (seed * ia + ic) % im
>         r = seed / imf
>         yield "ACGT"[int(r * 4)]
> 
> 
> def main():
>     data = "".join(islice(rnd(), 0, 120000))
>     file("data.txt", "w").write(data)
> 
> main()
> 
> --------------------------
> 
> A D2 traslation (I have used printf to reduce a lot the asm produced):
> 
> 
> // D2 version #1
> import std.file: read;
> import std.c.stdio: printf;
> 
> int test(char[] data) {
>     int count;
>     foreach (i; 0 ..  data.length - 3) {
>         char[] codon = data[i .. i + 3];
>         if (codon == "TAG" || codon == "TGA" || codon == "TAA")
>             count++;
>     }
>     return count;
> }
> 
> void main() {
>     char[] data0 = cast(char[])read("data.txt");
>     int n = 300;
>     char[] data = new char[data0.length * n];
>     for (size_t pos; pos < data.length; pos += data0.length)
>         data[pos .. pos+data0.length] = data0;
> 
>     printf("%d\n", test(data));
> }
> 
> --------------------------
> 
> // D2 version #2
> import std.file: read;
> import std.c.stdio: printf;
> 
> bool cmp3(char[] s1, string s2) {
>     return s1[0] == s2[0] && s1[1] == s2[1] && s1[2] == s2[2];
> }
> 
> int test(char[] data) {
>     int count;
>     foreach (i; 0 ..  data.length - 3) {
>         char[] codon = data[i .. i + 3];
>         if (cmp3(codon, "TAG") || cmp3(codon, "TGA") || cmp3(codon, "TAA"))
>             count++;
>     }
>     return count;
> }
> 
> void main() {
>     char[] data0 = cast(char[])read("data.txt");
>     int n = 300;
>     char[] data = new char[data0.length * n];
>     for (size_t pos; pos < data.length; pos += data0.length)
>         data[pos .. pos+data0.length] = data0;
> 
>     printf("%d\n", test(data));
> }

Impressive!
What's your diagnostic on why basic string compare parforms the way it does?
And have you tried a naive hand-written strComp (just for the sake of completeness) like:

    bool strComp (string s1, string s2) {
        auto l = s1.length;
        if (l != s2.length)
            return false;
        foreach (uint i ; 0..l)
            if (s1[i] != s2[i])
                return false;
        return true;
    }

(I bet this will perform at about the same order of magnitude as your version D#1.)
Also, is there a way to bit-compare given memory areas at much higher speed than element per element (I mean for arrays in general)?


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



More information about the Digitalmars-d mailing list