Improving std.regex(p)

Jacob Carlborg doob at me.com
Fri Jun 18 07:13:43 PDT 2010


On 2010-06-18 06:44, Andrei Alexandrescu wrote:
> There are currently two regexen in the standard library. The older one,
> std.regexp, is time-tested but only works with UTF8 and has a clunkier
> API. The newer one, std.regex, is newer and isolates the engine from the
> matches (and therefore can reuse and cache engines easier), and supports
> all character widths. But it's less tested and doesn't have that great
> of an interface because it pretty much inherits the existing one.
>
> I wish to improve regex handling in Phobos. The most important
> improvement is not in the interface - it's in the engine. The current
> engine is adequate but nothing to write home about, and for simple
> regexen is markedly slower than equivalent hand-written code (e.g.
> matching whitespace). One great opportunity would be for D to leverage
> its uncanny compile-time evaluation abilities and offer a regex that
> parses the pattern during compilation:
>
> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
>
> Such a static regex could be simpler than a full-blown regex with
> captures and backreferences etc., but it would have guaranteed
> performance (e.g. it would be an automaton instead of a backtracking
> engine) and would be darn fast because it would generate custom code for
> each regex pattern.
>
> See related work:
>
> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
>
>
> If we get as far as implementing what RE2 can do with compile-time
> evaluation, people will definitely notice.
>
> If there's anyone who'd want to tackle such a project (for Phobos or
> not), I highly encourage you to do so.
>
>
> Andrei

The is already a compile time regular expression engine available in the 
DDL project at dsource, it's in the meta package: 
http://dsource.org/projects/ddl/browser/trunk/meta I don't know if it's 
is still usable.

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list