Code improvement for DNA reverse complement?
Biotronic via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri May 19 05:55:05 PDT 2017
On Friday, 19 May 2017 at 12:21:10 UTC, biocyberman wrote:
>
> 1. Why do we need to use assumeUnique in 'revComp0' and
> 'revComp3'?
D strings are immutable, so if I'd created the result array as a
string, I couldn't change the individual characters. Instead, I
create a mutable array, change the elements in it, then cast it
to immutable when I'm done. assumeUnique does that casting while
keeping other type information and arguably providing better
documentation through its name. Behind the scenes, it's basically
doing cast(string)result;
> 2. What is going on with the trick of making chars enum like
> that in 'revComp3'?
By marking a symbol enum, we tell the compiler that its value
should be calculated at compile-time. It's a bit of an
optimization (but probably doesn't matter at all, and should be
done by the compiler anyway), and a way to say it's really,
really const. :p
Mostly, it's a habit I try to build, of declaring symbols as
const as possible, to make maintenance easier.
Bonus! Three more variations, all faster than revComp0:
string revComp4(string bps) {
const N = bps.length;
char[] result = new char[N];
for (int i = 0; i < N; ++i) {
switch(bps[N-i-1]) {
case 'A': result[i] = 'T'; break;
case 'C': result[i] = 'G'; break;
case 'G': result[i] = 'C'; break;
case 'T': result[i] = 'A'; break;
default: assert(false);
}
}
return result.assumeUnique;
}
string revComp5(string bps) {
const N = bps.length;
char[] result = new char[N];
foreach (i, ref e; result) {
switch(bps[N-i-1]) {
case 'A': e = 'T'; break;
case 'C': e = 'G'; break;
case 'G': e = 'C'; break;
case 'T': e = 'A'; break;
default: assert(false);
}
}
return result.assumeUnique;
}
string revComp6(string bps) {
char[] result = new char[bps.length];
auto p1 = result.ptr;
auto p2 = &bps[$-1];
while (p2 > bps.ptr) {
switch(*p2) {
case 'A': *p1 = 'T'; break;
case 'C': *p1 = 'G'; break;
case 'G': *p1 = 'C'; break;
case 'T': *p1 = 'A'; break;
default: assert(false);
}
p1++; p2--;
}
return result.assumeUnique;
}
revComp6 seems to be the fastest, but it's probably also the
least readable (a common trade-off).
More information about the Digitalmars-d-learn
mailing list