Would there be interest in a SERIOUS compile-time regex parser?

Benji Smith dlanguage at benjismith.net
Mon Oct 23 18:13:03 PDT 2006


Bruno Medeiros wrote:
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for 
> those ignorant on the matter, like me, what kind of processing is done 
> when creating a regexp, in terms of this internal form you speak of?

Generally speaking, regular expressions are compiled into a directed 
state graph. For example, a regex like this:

   a(b|c)d

...compiles into a state-graph like this:

<mono-spaced>

      b
a -<   >- d
      c

</mono-spaced>

The regex engine is a state-machine-crawler that knows how to navigate 
from node to node in the graph, including rules about when it is allowed 
to backtrack, etc.

So, the compiled representation of a regular expression isn't "opcodes" 
in the traditional sense of the word. It's just a very particular kind 
of state graph.

--Benji



More information about the Digitalmars-d mailing list