String compare performance
bearophile
bearophileHUGS at lycos.com
Sun Nov 28 03:00:49 PST 2010
Austin Hastings:
> It's not clear to me if the point of your post is "how do I make this go
> faster?" or "Waa! D underperforms Python by default".
The third D version in my post is fast enough, so the point of that post is that a very common string compare operation, done in the natural way, is a bit too much slow in D/DMD.
> If it is the former, a state machine is probably the right answer for
> long input strings.
This case is very simple, so a simple tree (as in the third D version of my post) is probably good enough.
In some similar but more complex situations of genomic processing I've seen that in C it's good to have computed gotos, to implement faster state machines. See my recent post "explicitly unimplemented computed gotos" :-) This was my use case beside the desire to create interpreters in D (it's a similar thing).
> Since the probability of failure is about .75, 0.5, {.5 or .75}, the
> overall comparison cost is about 1 + .25 + .125 = 1.375, times
> compare+loop times n, plus all the function call overhead, etc.
All 3 codons start with T, so if you test that first letter before the 3 string compare you speed up the original code a lot. But I was looking for a more general solution. Implementing a state machine is good, but in this case I'd like the compiler to do more optimizations by itself. A simple but good enough optimization for the compiler is to replace tests with strings (known to be short at compile-time) with inlined comparisons of their single chars.
> Using a DFA should reduce the cost to 1 * index, times n, assuming you
> are using a small alphabet (I'm guessing that your codons are
> ascii+uppercase only, table-size = 26).
In this case the alphabet was just 4 symbols: {'A', 'C', 'G', 'T'} (with no undecided or modified nucleosides).
Bye,
bearophile
More information about the Digitalmars-d
mailing list