Improving std.regex(p)

Nick Sabalausky a at a.a
Fri Jun 18 22:08:52 PDT 2010


"Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message 
news:hvetik$2k7j$1 at digitalmars.com...
> There are currently two regexen in the standard library. The older one, 
> std.regexp, is time-tested but only works with UTF8 and has a clunkier 
> API. The newer one, std.regex, is newer and isolates the engine from the 
> matches (and therefore can reuse and cache engines easier), and supports 
> all character widths. But it's less tested and doesn't have that great of 
> an interface because it pretty much inherits the existing one.
>
> I wish to improve regex handling in Phobos. The most important improvement 
> is not in the interface - it's in the engine. The current engine is 
> adequate but nothing to write home about, and for simple regexen is 
> markedly slower than equivalent hand-written code (e.g. matching 
> whitespace). One great opportunity would be for D to leverage its uncanny 
> compile-time evaluation abilities and offer a regex that parses the 
> pattern during compilation:
>
> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
>
> Such a static regex could be simpler than a full-blown regex with captures 
> and backreferences etc., but it would have guaranteed performance (e.g. it 
> would be an automaton instead of a backtracking engine) and would be darn 
> fast because it would generate custom code for each regex pattern.
>
> See related work:
>
> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
>
> If we get as far as implementing what RE2 can do with compile-time 
> evaluation, people will definitely notice.
>
> If there's anyone who'd want to tackle such a project (for Phobos or not), 
> I highly encourage you to do so.
>
>

This would be a good thing for it to pay attention to, if it doesn't 
already:
http://swtch.com/~rsc/regexp/regexp1.html





More information about the Digitalmars-d mailing list