Fast regular expressions (great work)

yidabu yidabu.nospamneed at yidabu.com
Sat May 19 03:16:29 PDT 2007


great work! 


Jascha Wetzel <[firstname]@mainia.de> Wrote:

> Here is an alpha release of a regular expression engine i'm working on:
> http://mainia.de/tdfaregexp-0.1-alpha.zip
> (artistic license 2.0)
> 
> It implements a slightly modified version of Ville Laurikari's tagged
> NFA method that allows for regular expression parsing in linear time as
> opposed to the very common backtracking method that requires exponential
> time (worst case).
> 
> As of now, the engine supports greedy and non-greedy operators,
> submatches and character classes. It compiles a regexp to a TDFA from
> which it can either be applied directly to an input string or compiled
> to D code.
> Supported operators: . | ( ) ? * + ?? *? +?
> 
> The regexp-to-D compilation cannot be done at compile-time, though,
> because the required data structures are too complex to be represented
> in tuples or strings.
> 
> Compile regexp.d and run it for example like this:
> regexp "(a*)([0-9a-fA-Z]*)(g|qwe)" aaaa987IUTqwe aaa 234234
> This will output the TNFA, TDFA and D Code and try to match the 3 input
> strings.
> The file regextest.d is an example of how to use the generated code.
> 
> There is still a lot of room for optimization, but i think that
> categorically, such a natively compiled, linear time regexp matcher
> should be the fastest possible.
> 
> Comments and bug reports are encouraged ;)




More information about the Digitalmars-d mailing list