Would there be interest in a SERIOUS compile-time regex parser?

Don Clugston dac at nospam.com.au
Fri Oct 27 02:36:50 PDT 2006


Bruno Medeiros wrote:
> Don Clugston wrote:
>> Bruno Medeiros wrote:
>>> Don Clugston wrote:
>>>>
>>>> It would behave *exactly* like the existing std.regexp, except that 
>>>> compilation into the internal form would happen via template 
>>>> metaprogramming, so that
>>>> (1) all errors would be caught at compile time, and
>>>> (2) there'd be a minor speedup because the compilation step would 
>>>> not happen at runtime, and
>>>> (3) otherwise it wouldn't be any faster than the existing regexp. 
>>>> However, there'd be no template code bloat, either.
>>>>
>>>
>>> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
>>> those ignorant on the matter, like me, what kind of processing is 
>>> done when creating a regexp, in terms of this internal form you speak 
>>> of? Is it converted to a simple internal form, or something more 
>>> complex? Bytecoded program seems pretty complex stuff, especially for 
>>> a regexp (isn't the translation direct) ?
>>
>> It's *nowhere near* as complicated as it sounds. If you look into the 
>> source of std.regexp, you'll see what I mean -- there's a function 
>> called 'compile'.
> 
>  From my quick look, it is a simpler representation of the regexp 
> string, but it still a linear (opcode) representation. It is not a graph 
> state like I'd expect (and like benji said). (one is used/created later? 
> I couldn't tell from std.regexp.test() )

The graph part is done via 'goto' opcodes. But the majority of the 
simple graph branches get converted into filters.



More information about the Digitalmars-d mailing list