Would there be interest in a SERIOUS compile-time regex parser?

Don Clugston dac at nospam.com.au
Tue Oct 17 03:04:02 PDT 2006


Walter Bright wrote:
> Don Clugston wrote:
>> The question is -- would this be worthwhile? I'm really not interested 
>> in making another toy.
>> It's straightforward, but tedious, and would double the length of 
>> std.regexp.
>> Would the use of templates be such a turn-off that people wouldn't use 
>> it?
>> Do the benefits exceed the cost?
> 
> Yes, for the following reasons:
> 
> 1) I think it would make for faster regexp's, more than just by omitting 
> the compile step. That's because the bytecoded program wouldn't need to 
> be interpreted.

I haven't worked out how to do backtracking sensibly when using mixins, 
so what I'm doing is simply compiling to the bytecoded program at 
compile time. Later on, it would be possible to add another step which 
looks for simple pattern types, and generates specific code for them (my 
guess is that 90% of all regexps use just one of a few simple categories).

> 2) When I show the current one to people, as opposed to Eric Niebler's 
> C++ one, the response is "but the D one is just a toy" with the 
> implication that D is just a toy.

<raise eyebrows> I wonder if anyone is actually using the C++ regexp in 
production code? Personally I get the feeling that so much of boost is 
extremely clever, but not terribly practical.

> 3) I wrote std.regexp long before people needed or asked for it. I knew 
> it would eventually become a critically needed module, and that came to 
> pass. It was nice that it was there when they needed it, and the time 
> passed had ensured that it was a solid piece of code.
> 
> 4) Your crafting of the current toy version was the catalyst for a big 
> leap forward in D's templating system. Doing a professional one may 
> expose critical problems that need to be fixed, and even if no such 
> flaws are discovered, it would prove that D's TMP capabilities are up to 
> scratch.
> 
> Sure, a lot of people are turned off by templates. That's one of my 
> motivations for making things like associative arrays usable without 
> templates. But for the people who do use templates, this can be a very 
> big deal.

There's a potential piece of syntax sugar that IMHO would be a killer 
feature. If it were possible to overload a function based on whether a 
parameter is a compile-time literal, or a variable, the use of template 
metaprogramming could become completely invisible to the user.

In my code, the situation is basically like this:
---------
class RegExp
{
   char [] program;

   void makeProgramAtCompileTime(char [] pattern)()
   {
	program = RegExpCompile!(pattern);
   }

   void makeProgramAtRunTime(char [] pattern)
   {
     program = runtimecompile(pattern);
   }
}
---------

Imagine if there were some way to create an alias
makeProgram(char [] pattern, int otherparameters)

which became
makeProgramAtCompileTime!(pattern)(otherparameters);
if pattern were a compile time literal,
but became
makeProgramAtRunTime(pattern, otherparameters);
if pattern were a variable.

Then the user wouldn't even need to know that they were using templates.
It would also let you do some pretty amazing stuff.
Consider something as basic as sin(x).

real sin_compiletime(real x)()
{
   static if (x==0) return x; // we can do this one
   else return sin_runtime(x); // everything else is too hard, pass it 
to the runtime version.
}

Which is much easier than training the optimiser that it can replace 
sin(0) with 0. For more complex situations, you could perform 
optimisations that are technically infeasible for a normal optimiser.
You could also catch more errors at compile time:

real sqrt_compiletime(real x)()
{
   static assert(x>=0, "Error, square root of negative number");
   return sqrt_runtime(x);
}

There probably aren't many cases where this kind of optimisation would 
actually be worth doing, but for those cases, the benefit could be 
considerable.

One possible syntax would be something like:

real sqrt(template real x)
{
   static if (is(x: const)) return sqrt_compiletime!(x);
   else return sqrt_runtime(x);
}

Another might be "template alias".

Something to think about; not a proposal at present.



More information about the Digitalmars-d mailing list