Template regexes, version 2

Georg Wrede georg.wrede at nospam.org
Tue Feb 21 05:15:50 PST 2006


Don Clugston wrote:
> I've had another go at doing compile-time regexps. With all the bugfixes 
> in recent releases, it's now possible to do it with mixins, generating 
> local functions instead of global ones. This provides _considerably_ 
> more flexibility.
> The ultimately generated functions can look like:
> char [] firstmatch(char [] s)
> {
>    char [][] results;
>    int someOtherResult;
>    void func(char [] s) {
>       // this parses the string, using the original regexp string.
>       // intermediate results (eg expressions in parentheses) are
>       // in the local variables: results[][], someOtherResult, etc.
>    }
>    func(s);
>    return results[0];
> }
> Usage is like:
> char [] x = firstmatch!("ab+")(someStr);


> It seems to me that there are 3 types of regexes:
> * pure static -- where the regex string is a string literal, known at 
> compile time
> * pure dynamic -- eg, as used in a grep utility.
> * pseudo-static. By this I mean regexps where the structure is constant, 
> but some strings are replaced with variables.

Pure static is what I always wanted with regexps in D!

Pure dynamic has less of utility to me, but others may disagree. And 
that is taken care of by Walter now.

Pseudo-static is cool!! And I believe *immensely* useful.

If I do log analyzers, net statistics programs, serious data mining 
frameworks, then pseudo-static regexps is what I do all day long! And, 
at the other end, for even most 'dscript' tasks this is the core!

> As far as runtime efficiency goes, it's almost ideal, 


> ...And then I return to the D newsgroups after a week and find the 
> goalposts have moved: regexps are now built into the language.

Awwww, they're just runtime. Mere syntax sherades to smoke-and-mirror 
away the fact.

> The mixin regexps are only at an early stage of development, but given 
> the current discussions I think it's important to know what can be done
> with templates (probably more than most people expect). 

Most people couldn't.  :-)

But then again, "a language can't be Serious if all its features are 
graspable to VB programmers".

> In the case of 
> what I've called "pseudo-static" regexps, they are arguably more 
> powerful than the built-in regexps of DMD 0.147.

I DEMAND that pseudo-static regexps be in the very next release of DMD.

> I don't know where to go from here. There are a few possibilities:

> 1. use template regexps as a demonstration of the power of D templates.
> --> Implement reduced syntax, keep the code simple and not well 
> optimised; more of a proof-of-concept; go for "maximum coolness".

 From a "marketing point of view", that may be wise.

> 2. Like 1, but with optimisations; try to be the fastest regexp of any 
> language. (cleaner-faster-better ... but we use the built-in regexps 
> anyway <g>).

I probably would not use the "built-ins" (if that refers to the current 
runtime library-come-syntax stuff) hardly at all.

> 3. Create a standard library. This is what I was aiming for,
> but I don't think it makes sense any more.

Of course it does!

> 4. potential integration into the language. Is this even possible?

Wanna guess my vote on this?!

> Probably the most sensible is to go with 1, to wake up the C++/boost 
> crowd. 

Why wake 'em up????

Unless... you expect them all to abandon C++ on sight and stampede over 

Heh, I wonder what 20 Dons would achieve together!

More information about the Digitalmars-d mailing list