Improving std.regex(p)

Ellery Newcomer ellery-newcomer at utulsa.edu
Sat Jun 19 12:10:55 PDT 2010


On 06/19/2010 03:35 AM, Ben Hanson wrote:
> I can't prove what the comlexity is for DFA compilation, but instead, I
> challenge anyone to show slow compilation times with any DFA compatible regex.
> As I don't have some D code yet, you could try the current lexertl code (just
> add a single rule). Compile as Release if using Visual C++, as the iterator
> debugging kills performance in VC++ 8 or above (one problem I guess wouldn't
> exist with a D codebase).
>
> As I say, the only obvious area for me that hammers compilation times is huge
> repeats. If you repeat an expression 50,0000 times or something, then yes, you
> will get a huge machine and it will take time to construct! :-)

et unicode?

>
> There are papers on tackling this with counting repeats, even for DFAs. If this
> area interests anyone, please let me know.

it does

>
> DFA compilation is particularly good for simplifying verbose regexes, even
> wihout DFA minimisation. Hybrid DFA/NFA engines have been written in the past
> and as far as feaures go, this is probably the ideal way to go. Russ Cox has
> some clever techniques too.

interesting

>
> Don't forget that grep is (traditionally) implemented with a DFA engine.

I didn't know that

>
> Of course, the real irony is that Henry Spencer wrote a much better regex
> engine for TCL (a DFA/NFA hybrid as I understand it) which is much better than
> his original NFA backtracker and yet everyone holds up the backtracking
> algorithm as the way to do things...
>
> That's inertia for you.
>
> Regards,
>
> Ben



More information about the Digitalmars-d mailing list