Code improvement for DNA reverse complement?

ag0aep6g via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon May 22 03:35:36 PDT 2017


On 05/22/2017 10:58 AM, biocyberman wrote:
> @ag0aep6g
>> You fell into a trap there. The value is calculated at compile time, 
>> but it has >copy/paste-like behavior. That is, whenever you use 
>> `chars`, the code behaves as if you >typed out the array literal. That 
>> means, the whole array is re-created on every iteration.
> 
>> Use `static immutable` instead. It still forces compile-time 
>> calculation, but it doesn't > have copy/paste behavior. Speeds up 
>> revComp3 a lot.
> 
> With  'iteration' here you mean running lifetime of the function, or in 
> other words, each one of the 10_000 cycles in the benchmark?

For reference, here is the version of revComp3 I commented on:

----
string revComp3(string bps) {
     const N = bps.length;
     enum chars = [Repeat!('A'-'\0', '\0'), 'T',
                 Repeat!('C'-'A'-1, '\0'), 'G',
                 Repeat!('G'-'C'-1, '\0'), 'C',
                 Repeat!('T'-'G'-1, '\0'), 'A'];

     char[] result = new char[N];
     for (int i = 0; i < N; ++i) {
         result[i] = chars[bps[N-i-1]];
     }
     return result.assumeUnique;
}
----

By "iteration" I mean every execution of the body of the `for` loop. For 
every new `i`, a new array is created.

The loop above is equivalent to this:

----
     for (int i = 0; i < N; ++i) {
         result[i] = [Repeat!('A'-'\0', '\0'), 'T',
                 Repeat!('C'-'A'-1, '\0'), 'G',
                 Repeat!('G'-'C'-1, '\0'), 'C',
                 Repeat!('T'-'G'-1, '\0'), 'A'][bps[N-i-1]];
     }
----

Used like that, the array literal

     [Repeat!('A'-'\0', '\0'), 'T',
     Repeat!('C'-'A'-1, '\0'), 'G',
     Repeat!('G'-'C'-1, '\0'), 'C',
     Repeat!('T'-'G'-1, '\0'), 'A']

allocates a new array on every execution of `result[i] = ...;`.

> Could you provide some more reading for what you are telling here? I can 
> only guess it is intrinsic behavior of an 'enum'.

Unfortunately, the spec page (<https://dlang.org/spec/enum.html>) 
doesn't seem to mention this.

But Ali Çehreli covers it in his book on the "immutability" page (I 
would have expected to find it on the "enum" page):

http://ddili.org/ders/d.en/const_and_immutable.html#ix_const_and_immutable.enum

The details can be confusing here. There is an element of 
copy/paste-like behavior, but it's not as simple as taking the 
right-hand side of the enum declaration and substituting it for the 
left-hand name.

The right-hand side is evaluated at compile time. The result of that can 
be thought of as an array literal. It's that array literal that gets 
substituted for the name.

An example with comments:

----
import std.stdio: writeln;

/* f prints a message when called at run time. Then it returns its
argument times ten. */
int f(int x)
{
     if (!__ctfe) writeln("f(", x, ")");
     return x * 10;
}

void main()
{
      /* The following line prints f's messages. The f calls are normal
     run-time calls. Then the writeln prints "false" because each array
     literal creates a new, distinct array.
     */
     writeln([f(1), f(2)] is [f(1), f(2)]); /* false */

     /* The next `enum` line does not print f's messages. The calls go
     through CTFE.
     The `writeln` line afterwards prints "false". ea gets pre-computed
     via CTFE, but the result acts like an array literal. So it's the
     same as writing `writeln([10, 20] is [10, 20]);`.
     */
     enum int[] ea = [f(1), f(2)];
     writeln(ea is ea); /* false */

     /* No messages either with `static immutable`. Like ea, the
     right-hand side goes through CTFE.
     But unlike ea, ia does not act like an array literal. `writeln`
     prints "true".
     */
     static immutable int[] ia = [f(1), f(2)];
     writeln(ia is ia); /* true */
}
----


More information about the Digitalmars-d-learn mailing list