regex direct support for sse4 intrinsics

Tove tove at fransson.se
Tue Mar 27 14:30:45 PDT 2012


On Tuesday, 27 March 2012 at 09:51:07 UTC, bearophile wrote:
> Dmitry Olshansky:
>
>> Speaking more of run-time version of regex, it is essentially 
>> running a VM that executes instructions that do various kinds 
>> of match-this, match-that. The VM dispatch code is quite slow, 
>> the optimal _threaded_ code requires either doing it in 
>> _assembly_ or _computed_ goto in the language. The VM 
>> _dispatch_ takes up to 30% of time in the default matcher.
>
> I have used computed gotos in GCC-C to implement some quite 
> efficient finite state machines to be used in computational 
> biology. I've seen 20%+ speedups compared to my alternative 
> switch-based implementation. So I'd like computed gotos in D 
> too.
>

While I am in favor of all enhancements which improve low-level 
access, I'm very surprised by your findings by computed gotos... 
the compiler I am most used to(rvct for arm)... seems proficient 
in emitting jump table instructions(TBB, TBH) for thumb2... but 
based on your findings I will definitely re-check the generated 
asm. Could it be that the compiler "heuristics" simply is less 
than optimal... and an alternative would be to force a specific 
implementation with a pragma? or the recent @annotation syntax...

pragma(switch, "jumptable")
pragma(switch, "binary-search-tree")

it would have the benefit of not having to re-factor the code and 
one could easily benchmark which solution is the fastest for a 
different inputs...



More information about the Digitalmars-d mailing list