Would there be interest in a SERIOUS compile-time regex parser?

Mon Oct 16 10:40:50 PDT 2006

Don Clugston wrote:
> In the past, Eric and I both developed compile-time regex engines, but 
> they were proof-of-concept rather than something you'd actually use in 
> production code. I think this also applies to C++ metaprogramming regexp 
> engines, too.
> 
> I've had a bit of play around with the regexp code in Phobos, and have 
> convinced myself that it would be straightforward to create a 
> compile-time wrapper for the existing engine.
> 
> Usage could be something like:
> --------
> void main()
> {
>     char [] s = "abcabcabab";
>          // case insensitive search
>     foreach(m; rexSearch!("ab+", "i")(s))
>     {
>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>     }
> }
> --------
> 
> It would behave *exactly* like the existing std.regexp, except that 
> compilation into the internal form would happen via template 
> metaprogramming, so that
> (1) all errors would be caught at compile time, and
> (2) there'd be a minor speedup because the compilation step would not 
> happen at runtime, and
> (3) otherwise it wouldn't be any faster than the existing regexp. 
> However, there'd be no template code bloat, either.
> 
> Existing code would be unchanged. You could even write:
> 
> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
> 
> (assign a pre-compiled regular expression to an existing phobos RegExp).
> 
> There's potentially a greater speedup possible, because the Regexp class 
> could become a struct, with no need for any dynamic memory allocation; 
> but if this was done, mixing runtime and compile-time regexps together 
> wouldn't be as seamless. And of course there's load of room for future 
> enhancement.
> 
> BUT...
> 
> The question is -- would this be worthwhile? I'm really not interested 
> in making another toy.
> It's straightforward, but tedious, and would double the length of 
> std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?

It's always difficult to forsee the ramifications of anything that is 
new in this sense.  I'm curious as to how many people have used the 
Boost implementation... maybe that would give you an idea of how much 
real-world potential it has.

FWIW, I could use it within Enki for regex expressions, when they're 
used in an input grammar.  It would yield some nice speed increases for 
regex heavy designs, w/o having to re-implement it all over again for 
Enki's sake.  The only requirement I have is that it must handle UTF 
correctly, preferably UTF32 or a templated char type.

-- 
- EricAnderton at yahoo