String compare performance

Sat Nov 27 22:13:39 PST 2010

On 11/27/2010 11:53 AM, bearophile wrote:
> While translating a common Python script to D, I have found something interesting, so I have reduced it to a little benchmark.
>
> Below there is the reduced Python2 code (it uses Psyco), and a little program to generate some test data. The timing of the first D2 version is not good compared to the Python-Psyco program (the generation of the *300 array is a quick thing), so I have created two more D2 versions to show myself that D2 wasn't broken :-)
>
> The reduced code looks like a syntetic benchmark, but it has about the same performance profile of a 60 lines long Python script (the original code was using xrange(0,len(...)-3,3)  instead of xrange(len(...)-3), but the situation doesn't change much).
>

It's not clear to me if the point of your post is "how do I make this go 
faster?" or "Waa! D underperforms Python by default".

If it is the former, a state machine is probably the right answer for 
long input strings.

start -> E

E A,C,G:-> E
E T:-> T

T A:-> TA
T C:-> E
T G:-> TG
T T:-> T

TA A,G:-> E, {++count}
TA C:-> E
TA T:-> T

TG A:-> E, {++count}
TG C,G:-> E
TG T:-> T

Since the probability of failure is about .75, 0.5, {.5 or .75}, the 
overall comparison cost is about 1 + .25 + .125 = 1.375, times 
compare+loop times n, plus all the function call overhead, etc.

Using a DFA should reduce the cost to 1 * index, times n, assuming you 
are using a small alphabet (I'm guessing that your codons are 
ascii+uppercase only, table-size = 26).

=Austin