Would there be interest in a SERIOUS compile-time regex parser?
Benji Smith
dlanguage at benjismith.net
Mon Oct 23 18:13:03 PDT 2006
Bruno Medeiros wrote:
> Whoa, "internal form" and "bytecoded program"? Out of curiosity, for
> those ignorant on the matter, like me, what kind of processing is done
> when creating a regexp, in terms of this internal form you speak of?
Generally speaking, regular expressions are compiled into a directed
state graph. For example, a regex like this:
a(b|c)d
...compiles into a state-graph like this:
<mono-spaced>
b
a -< >- d
c
</mono-spaced>
The regex engine is a state-machine-crawler that knows how to navigate
from node to node in the graph, including rules about when it is allowed
to backtrack, etc.
So, the compiled representation of a regular expression isn't "opcodes"
in the traditional sense of the word. It's just a very particular kind
of state graph.
--Benji
More information about the Digitalmars-d
mailing list