Improving std.regex(p)

Ellery Newcomer ellery-newcomer at utulsa.edu
Fri Jun 18 16:23:45 PDT 2010


On 06/18/2010 05:08 PM, sybrandy wrote:
>> I think only backtracking engines (not DFAs) have the exponential run
>> time problem. Regular expression grammar is itself context free (so it's
>> parsable efficiently), the NFA and NFA->DFA algorithms are polynomial,
>> and the generated DFA makes one transition per input character.
>>
>>
>> Andrei

The NFA->DFA algorithms are what I was referring to. My copy of the 
dragon book suggests that DFAs usually have O(n^3) initialization time 
but O(n^2 * 2^n) worst case relative to the number of states. I'd love 
to be shown wrong, though.

>
> I don't know how practical it is, but I had a thought to allow some of
> these nasty regexes to run better: use threads. In theory, and I have
> not done any work to test/prove this so it's very theoretical, you can
> have multiple threads running in parallel each testing a potential path.
> Granted, there's always the issue of spawning/managing threads and
> potentially overloading the system by spawning too many, however it may
> be a viable alternative if you really HAVE to execute a particularly
> nasty regex.

Isn't what you're describing an NFA?

>
> Anyway, it's just a thought. I'd hate to lose the benefits of a NFA if
> we're afraid of the potential performance hit if a user writes a bad regex.
>
> Casey

You don't get a performance hit with DFAs on knarly regexen (actually, 
they would be faster than NFAs), just regexen that take forever to 
compile. I wonder if it would be possible to mechanically detect problem 
cases for DFA compilation and on detection switch over to an NFA engine.


More information about the Digitalmars-d mailing list