Improving std.regex(p)
Nick Sabalausky
a at a.a
Fri Jun 18 22:08:52 PDT 2010
"Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message
news:hvetik$2k7j$1 at digitalmars.com...
> There are currently two regexen in the standard library. The older one,
> std.regexp, is time-tested but only works with UTF8 and has a clunkier
> API. The newer one, std.regex, is newer and isolates the engine from the
> matches (and therefore can reuse and cache engines easier), and supports
> all character widths. But it's less tested and doesn't have that great of
> an interface because it pretty much inherits the existing one.
>
> I wish to improve regex handling in Phobos. The most important improvement
> is not in the interface - it's in the engine. The current engine is
> adequate but nothing to write home about, and for simple regexen is
> markedly slower than equivalent hand-written code (e.g. matching
> whitespace). One great opportunity would be for D to leverage its uncanny
> compile-time evaluation abilities and offer a regex that parses the
> pattern during compilation:
>
> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
>
> Such a static regex could be simpler than a full-blown regex with captures
> and backreferences etc., but it would have guaranteed performance (e.g. it
> would be an automaton instead of a backtracking engine) and would be darn
> fast because it would generate custom code for each regex pattern.
>
> See related work:
>
> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
>
> If we get as far as implementing what RE2 can do with compile-time
> evaluation, people will definitely notice.
>
> If there's anyone who'd want to tackle such a project (for Phobos or not),
> I highly encourage you to do so.
>
>
This would be a good thing for it to pay attention to, if it doesn't
already:
http://swtch.com/~rsc/regexp/regexp1.html
More information about the Digitalmars-d
mailing list