Improving std.regex(p)
Don
nospam at nospam.com
Fri Jun 18 07:53:21 PDT 2010
Jacob Carlborg wrote:
> On 2010-06-18 06:44, Andrei Alexandrescu wrote:
>> There are currently two regexen in the standard library. The older one,
>> std.regexp, is time-tested but only works with UTF8 and has a clunkier
>> API. The newer one, std.regex, is newer and isolates the engine from the
>> matches (and therefore can reuse and cache engines easier), and supports
>> all character widths. But it's less tested and doesn't have that great
>> of an interface because it pretty much inherits the existing one.
>>
>> I wish to improve regex handling in Phobos. The most important
>> improvement is not in the interface - it's in the engine. The current
>> engine is adequate but nothing to write home about, and for simple
>> regexen is markedly slower than equivalent hand-written code (e.g.
>> matching whitespace). One great opportunity would be for D to leverage
>> its uncanny compile-time evaluation abilities and offer a regex that
>> parses the pattern during compilation:
>>
>> foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }
>>
>> Such a static regex could be simpler than a full-blown regex with
>> captures and backreferences etc., but it would have guaranteed
>> performance (e.g. it would be an automaton instead of a backtracking
>> engine) and would be darn fast because it would generate custom code for
>> each regex pattern.
>>
>> See related work:
>>
>> http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
>>
>>
>>
>> If we get as far as implementing what RE2 can do with compile-time
>> evaluation, people will definitely notice.
>>
>> If there's anyone who'd want to tackle such a project (for Phobos or
>> not), I highly encourage you to do so.
>>
>>
>> Andrei
>
> The is already a compile time regular expression engine available in the
> DDL project at dsource, it's in the meta package:
> http://dsource.org/projects/ddl/browser/trunk/meta I don't know if it's
> is still usable.
>
I wrote that code before CTFE was available. It helped to design D's
metaprogramming system, but the code itself is thoroughly obsolete now.
It is SO much easier to write that stuff with CTFE.
Incidentally, now that we have the no-brackets form of template
instantiation, there's much more freedom for compile-time regex syntax.
auto k = regex!"[A-Za-z][0-9A-Za-z]*"(s1);
More information about the Digitalmars-d
mailing list