String compare performance
spir
denis.spir at gmail.com
Sat Nov 27 14:04:29 PST 2010
On Sat, 27 Nov 2010 11:53:06 -0500
bearophile <bearophileHUGS at lycos.com> wrote:
> Timings, dmd compiler, best of 4, seconds:
> D #1: 5.72
> Psy: 1.59
> D #2: 0.55
> D #3: 0.34
>
>
> 2.3 GHz CPU.
> dmd 2.050, ldc based on DMD v1.057 and llvm 2.6, gdc head based on GCC 4.4.5.
>
> --------------------------
>
> # Python2 version
> import psyco
>
> def test(data):
> count = 0
> for i in xrange(len(data) - 3):
> codon = data[i : i + 3]
> if codon == "TAG" or codon == "TGA" or codon == "TAA":
> count += 1
> return count
>
> def main():
> data = open("data.txt").read()
> data = data * 300
> print test(data)
>
> psyco.full()
> main()
>
> --------------------------
>
> To produce some test data:
>
>
> # Python2 code
> from itertools import islice
>
> def rnd(ia = 3877, ic = 29573, im = 139968):
> seed = 42
> imf = float(im)
> while 1:
> seed = (seed * ia + ic) % im
> r = seed / imf
> yield "ACGT"[int(r * 4)]
>
>
> def main():
> data = "".join(islice(rnd(), 0, 120000))
> file("data.txt", "w").write(data)
>
> main()
>
> --------------------------
>
> A D2 traslation (I have used printf to reduce a lot the asm produced):
>
>
> // D2 version #1
> import std.file: read;
> import std.c.stdio: printf;
>
> int test(char[] data) {
> int count;
> foreach (i; 0 .. data.length - 3) {
> char[] codon = data[i .. i + 3];
> if (codon == "TAG" || codon == "TGA" || codon == "TAA")
> count++;
> }
> return count;
> }
>
> void main() {
> char[] data0 = cast(char[])read("data.txt");
> int n = 300;
> char[] data = new char[data0.length * n];
> for (size_t pos; pos < data.length; pos += data0.length)
> data[pos .. pos+data0.length] = data0;
>
> printf("%d\n", test(data));
> }
>
> --------------------------
>
> // D2 version #2
> import std.file: read;
> import std.c.stdio: printf;
>
> bool cmp3(char[] s1, string s2) {
> return s1[0] == s2[0] && s1[1] == s2[1] && s1[2] == s2[2];
> }
>
> int test(char[] data) {
> int count;
> foreach (i; 0 .. data.length - 3) {
> char[] codon = data[i .. i + 3];
> if (cmp3(codon, "TAG") || cmp3(codon, "TGA") || cmp3(codon, "TAA"))
> count++;
> }
> return count;
> }
>
> void main() {
> char[] data0 = cast(char[])read("data.txt");
> int n = 300;
> char[] data = new char[data0.length * n];
> for (size_t pos; pos < data.length; pos += data0.length)
> data[pos .. pos+data0.length] = data0;
>
> printf("%d\n", test(data));
> }
Impressive!
What's your diagnostic on why basic string compare parforms the way it does?
And have you tried a naive hand-written strComp (just for the sake of completeness) like:
bool strComp (string s1, string s2) {
auto l = s1.length;
if (l != s2.length)
return false;
foreach (uint i ; 0..l)
if (s1[i] != s2[i])
return false;
return true;
}
(I bet this will perform at about the same order of magnitude as your version D#1.)
Also, is there a way to bit-compare given memory areas at much higher speed than element per element (I mean for arrays in general)?
Denis
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com
More information about the Digitalmars-d
mailing list