Bad performance of simple regular expression - why??

Benji Smith dlanguage at benjismith.net
Fri Feb 9 10:11:28 PST 2007


Jascha Wetzel wrote:
> Manfred Nowak wrote:
>> Also for maximal munching several regular expression there is no need 
>> for backtracking. 
> 
> maximal munch, non-greedy matching and negative matches would usually
> use backtracking. doing it without backtracking would require to use
> several DFAs - or is there a simpler method?
> 
> afaik, the "school-book-way" for backtracking are deterministic pushdown
> automatons (PDAs) that can also be interpreted efficiently or compiled
> to bytecode.
> so i don't see why there shouldn't be backtracking in the regexp
> implementation.

The problem is that most regex engines (in fact, all of them, as far as 
I know) use backtracking even when a DFA implementation could be used to 
determine (much earlier in the search process) that a match is impossible.

A backtracking approach might be necessary with positive and negative 
lookaround assertions (though I still suspect that a DFA could even be 
used for these cases), but every mainstream regex engine uses a 
backtracking approach even for simple choice situations like (cat|car) 
where a DFA would be much more appropriate.

And, by the way, it's possible to create a DFA implementation that 
degrades into a backtracking matcher for certain subexpressions.

--benji



More information about the Digitalmars-d mailing list