Improving std.regex(p)

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Jun 18 10:49:09 PDT 2010


Lars T. Kyllingstad wrote:
> On Thu, 17 Jun 2010 21:44:03 -0700, Andrei Alexandrescu wrote:
> 
>> There are currently two regexen in the standard library. The older one,
>> std.regexp, is time-tested but only works with UTF8 and has a clunkier
>> API. The newer one, std.regex, is newer and isolates the engine from the
>> matches (and therefore can reuse and cache engines easier), and supports
>> all character widths. But it's less tested and doesn't have that great
>> of an interface because it pretty much inherits the existing one.
>>
>> I wish to improve regex handling in Phobos. The most important
>> improvement is not in the interface - it's in the engine. The current
>> engine is adequate but nothing to write home about, and for simple
>> regexen is markedly slower than equivalent hand-written code (e.g.
>> matching whitespace). One great opportunity would be for D to leverage
>> its uncanny compile-time evaluation abilities and offer a regex that
>> parses the pattern during compilation:
>>
>> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
>>
>> Such a static regex could be simpler than a full-blown regex with
>> captures and backreferences etc., but it would have guaranteed
>> performance (e.g. it would be an automaton instead of a backtracking
>> engine) and would be darn fast because it would generate custom code for
>> each regex pattern.
>>
>> See related work:
>>
>> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-
> to-regular.html
>> If we get as far as implementing what RE2 can do with compile-time
>> evaluation, people will definitely notice.
>>
>> If there's anyone who'd want to tackle such a project (for Phobos or
>> not), I highly encourage you to do so.
> 
> 
> There is the 'scregexp' project on dsource:
> 
>     http://www.dsource.org/projects/scregexp/
> 
> It's D1/Tango, but maybe it could be adapted to D2/Phobos?  It would at 
> least serve as a starting point for anyone wanting to try their hand at 
> doing this.

scregexp includes the following requirement within the license:

"Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution."

That would need to be changed before inclusion in Phobos. It looks like
there are three people in the copyright notice: Walter Bright, Marton
Papp, and yidabu. Does anyone know Marton's email address?


Andrei


More information about the Digitalmars-d mailing list