re2d lexer generator

Ulya skvadrik at gmail.com
Mon Nov 25 16:01:54 UTC 2024


Regular expression compiler [re2c](http://re2c.org) now [supports 
D](http://re2c.org/releases/release_notes.html#release-4-0).

A short intro from the official website: *re2c* stands for 
*Regular Expressions to Code*. It is a free and open-source lexer 
generator that supports C, C++, D, Go, Haskell, Java, JavaScript, 
OCaml, Python, Rust, V, Zig, and can be extended to other 
languages by implementing a single [syntax 
file](http://re2c.org/manual/manual_d.html#syntax-files). The 
primary focus of re2c is on generating *fast* code: it compiles 
regular expressions to deterministic finite automata and 
translates them into direct-coded lexers in the target language 
(such lexers are generally faster and easier to debug than their 
table-driven analogues). Secondary re2c focus is on 
*flexibility*: it does not assume a fixed program template; 
instead, it allows the user to embed lexers anywhere in the 
source code and configure them to avoid unnecessary buffering and 
bounds checks. Internal algorithm used by re2c is based on a 
special kind of deterministic finite automata: [lookahead 
TDFA](http://re2c.org/2022_borsotti_trofimovich_a_closer_look_at_tdfa.pdf). These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead.

There is a [detailed user 
guide](http://re2c.org/manual/manual_d.html) an [online 
playground](http://re2c.org/playground/?example=d/01_basic.re) 
with many examples.


More information about the Digitalmars-d-announce mailing list