Understanding regexes

Don Clugston dac at nospam.com.au
Thu Feb 23 07:33:14 PST 2006


Georg Wrede wrote:
> Don Clugston wrote:
>> Georg Wrede wrote:
>>> Walter Bright wrote:
>>>> Georg Wrede wrote:
>>>>> Walter Bright wrote:
>>>>>
>>>>>> If the compiler is to constant fold regular expressions, then it 
>>>>>> needs to build in to the compiler exactly what would happen if
>>>>>> the regex code was evaluated at runtime.
>>>>>
>>>>> Yes. IMHO in essence, the binary machine code, which the runtime
>>>>> also would build. What I have a hard time seeing is, how this
>>>>> differs from building a normal function at compile time?
>>>>
>>>> Consider the strlen() function. Compiling a strlen() function and
>>>> generating machine code for it is a very different thing from the
>>>> compiler knowing what strlen is and replacing:
>>>>
>>>> strlen("abc")
>>>>
>>>> with:
>>>>
>>>> 3
>>>
>>> Either I'm getting too old for this business, or you're only giving 
>>> pseudo answers.
>>>
>>> (1) If we were to stop the compiler dead in its tracks, and I 
>>> compiled the function "manually" and returned it to the compiler, 
>>> would we still have a problem here?
>>
>>
>> That would be OK. The issue is that the compiler is a tool for 
>> converting text to machine code. It has no mechanism for executing the 
>> machine code.
> 
> Aaaaaah... heureka.
> 
> So there's a wavelength problem here!
> 
> What I've been talking all along, is 'a regexp compiled into a function, 
> but _not_run_ at compile time.

Oh dear, I think I've just confused you. I was only referring to strlen, 
not to regexps. I was trying to explain Walter's statement about why 
it's difficult for a compiler writer.

> ** So, Don's regexps can be both "compiled" and "run" at compile time, 
> whereas what I've been wishing all along is a "compile-time compiled but 
> not compile-time run" regexp!

No, you were right the first time. At compile time, the regexp pattern 
string is compiled into an ordinary function.

Example: the trivial case

bool b = test!("abc")(str);

compiles to something like:

int test_a(char [] str)
{
   return str.length>=3 && str[0..3]=="abc";
}

bool b = test_a(str);

It doesn't actually call the test_a function at compile time.

It's only something like strlen!("abc"), where all of the parameters are 
known at run time, which is "run" at compile time. In the regexp case, 
it's the "make a regexp engine" code which is run at compile time. The 
engine itself is only run at runtime.

> In other words, a profoundly normal function, just that it happens to be 
> written in RegexpLanguage instead of vanilla D (Or C, or asm).

Exactly.



More information about the Digitalmars-d-announce mailing list