Would there be interest in a SERIOUS compile-time regex parser?

Don Clugston dac at nospam.com.au
Tue Oct 17 08:50:28 PDT 2006


rm wrote:
> Don Clugston wrote:
>> In the past, Eric and I both developed compile-time regex engines, but
>> they were proof-of-concept rather than something you'd actually use in
>> production code. I think this also applies to C++ metaprogramming regexp
>> engines, too.
>>
>> I've had a bit of play around with the regexp code in Phobos, and have
>> convinced myself that it would be straightforward to create a
>> compile-time wrapper for the existing engine.
>>
>> Usage could be something like:
>> --------
>> void main()
>> {
>>     char [] s = "abcabcabab";
>>          // case insensitive search
>>     foreach(m; rexSearch!("ab+", "i")(s))
>>     {
>>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>>     }
>> }
>> --------
>>
>> It would behave *exactly* like the existing std.regexp, except that
>> compilation into the internal form would happen via template
>> metaprogramming, so that
>> (1) all errors would be caught at compile time, and
>> (2) there'd be a minor speedup because the compilation step would not
>> happen at runtime, and
>> (3) otherwise it wouldn't be any faster than the existing regexp.
>> However, there'd be no template code bloat, either.
>>
>> Existing code would be unchanged. You could even write:
>>
>> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
>>
>> (assign a pre-compiled regular expression to an existing phobos RegExp).
>>
>> There's potentially a greater speedup possible, because the Regexp class
>> could become a struct, with no need for any dynamic memory allocation;
>> but if this was done, mixing runtime and compile-time regexps together
>> wouldn't be as seamless. And of course there's load of room for future
>> enhancement.
>>
>> BUT...
>>
>> The question is -- would this be worthwhile? I'm really not interested
>> in making another toy.
>> It's straightforward, but tedious, and would double the length of
>> std.regexp.
>> Would the use of templates be such a turn-off that people wouldn't use it?
>> Do the benefits exceed the cost?
> 
> I'm not so far as looking into the current regexp module.
> But otoh I've already done some of the homework:
> 
> template findChar(char[] stringToSearch, char charToFind)
> {
>   static
>     if ( stringToSearch.length == 0
>        || stringToSearch[0] == charToFind )
>       const int findChar = 0;
>     else
>       const int findChar
>          = 1 + findChar!( stringToSearch[1..stringToSearch.length]
>                         , charToFind);
> }
> 
> gives the position of the char in the string, but if the position ==
> length of stringToSearch, charToFind is not present.
> 
> I've got some others as well, I can parse an string literal into an
> integer :-)
> 
> I'm willing to give a hand if you want.

Thanks.
Some of my old code is on dsource, in ddl/meta; it was much more 
difficult back then, when there were so many compiler bugs.
meta.nameof is the code I'm proudest of.

I have the basic framework in, and I've made progress on the character 
class stuff, which I think is the most complicated part (since it's full 
of bit masking operations on arrays). So far everything has been quite 
straightforward, and no innovation has been required.

A fantastically helpful thing you could do immediately would be to 
improve the code coverage of the unit tests in std.regexp. Currently 
it's only about 40%. For my purposes, I only need high test coverage, 
not necessarily sensible tests (I'll just test the compiled output). 
Sorry that it's not a very sexy job.
(Hint: we need some tests for the undocumented features like "a*?").



More information about the Digitalmars-d mailing list