Code improvement for DNA reverse complement?

Biotronic via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri May 19 05:55:05 PDT 2017


On Friday, 19 May 2017 at 12:21:10 UTC, biocyberman wrote:
>
> 1. Why do we need to use assumeUnique in 'revComp0' and 
> 'revComp3'?

D strings are immutable, so if I'd created the result array as a 
string, I couldn't change the individual characters. Instead, I 
create a mutable array, change the elements in it, then cast it 
to immutable when I'm done. assumeUnique does that casting while 
keeping other type information and arguably providing better 
documentation through its name. Behind the scenes, it's basically 
doing cast(string)result;

> 2. What is going on with the trick of making chars enum like 
> that in 'revComp3'?

By marking a symbol enum, we tell the compiler that its value 
should be calculated at compile-time. It's a bit of an 
optimization (but probably doesn't matter at all, and should be 
done by the compiler anyway), and a way to say it's really, 
really const. :p

Mostly, it's a habit I try to build, of declaring symbols as 
const as possible, to make maintenance easier.


Bonus! Three more variations, all faster than revComp0:

string revComp4(string bps) {
     const N = bps.length;
     char[] result = new char[N];
     for (int i = 0; i < N; ++i) {
         switch(bps[N-i-1]) {
             case 'A': result[i] = 'T'; break;
             case 'C': result[i] = 'G'; break;
             case 'G': result[i] = 'C'; break;
             case 'T': result[i] = 'A'; break;
             default: assert(false);
         }
     }
     return result.assumeUnique;
}

string revComp5(string bps) {
     const N = bps.length;
     char[] result = new char[N];
     foreach (i, ref e; result) {
         switch(bps[N-i-1]) {
             case 'A': e = 'T'; break;
             case 'C': e = 'G'; break;
             case 'G': e = 'C'; break;
             case 'T': e = 'A'; break;
             default: assert(false);
         }
     }
     return result.assumeUnique;
}

string revComp6(string bps) {
     char[] result = new char[bps.length];
     auto p1 = result.ptr;
     auto p2 = &bps[$-1];

     while (p2 > bps.ptr) {
         switch(*p2) {
             case 'A': *p1 = 'T'; break;
             case 'C': *p1 = 'G'; break;
             case 'G': *p1 = 'C'; break;
             case 'T': *p1 = 'A'; break;
             default: assert(false);
         }
         p1++; p2--;
     }
     return result.assumeUnique;
}

revComp6 seems to be the fastest, but it's probably also the 
least readable (a common trade-off).


More information about the Digitalmars-d-learn mailing list