Improving std.regex(p)

Lars T. Kyllingstad public at kyllingen.NOSPAMnet
Thu Jun 17 23:33:28 PDT 2010


On Thu, 17 Jun 2010 21:44:03 -0700, Andrei Alexandrescu wrote:

> There are currently two regexen in the standard library. The older one,
> std.regexp, is time-tested but only works with UTF8 and has a clunkier
> API. The newer one, std.regex, is newer and isolates the engine from the
> matches (and therefore can reuse and cache engines easier), and supports
> all character widths. But it's less tested and doesn't have that great
> of an interface because it pretty much inherits the existing one.
> 
> I wish to improve regex handling in Phobos. The most important
> improvement is not in the interface - it's in the engine. The current
> engine is adequate but nothing to write home about, and for simple
> regexen is markedly slower than equivalent hand-written code (e.g.
> matching whitespace). One great opportunity would be for D to leverage
> its uncanny compile-time evaluation abilities and offer a regex that
> parses the pattern during compilation:
> 
> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
> 
> Such a static regex could be simpler than a full-blown regex with
> captures and backreferences etc., but it would have guaranteed
> performance (e.g. it would be an automaton instead of a backtracking
> engine) and would be darn fast because it would generate custom code for
> each regex pattern.
> 
> See related work:
> 
> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-
to-regular.html
> 
> If we get as far as implementing what RE2 can do with compile-time
> evaluation, people will definitely notice.
> 
> If there's anyone who'd want to tackle such a project (for Phobos or
> not), I highly encourage you to do so.


There is the 'scregexp' project on dsource:

    http://www.dsource.org/projects/scregexp/

It's D1/Tango, but maybe it could be adapted to D2/Phobos?  It would at 
least serve as a starting point for anyone wanting to try their hand at 
doing this.

-Lars


More information about the Digitalmars-d mailing list