prefix match of a regex and optimized dirEntries for regex search

Timothee Cour via Digitalmars-d digitalmars-d at puremagic.com
Wed Jun 18 10:38:52 PDT 2014


I made a simple modification to std.regex to allow an option to prefix
match a regex.
Formally, if L(R) is the language recognized by a regex R, the language
recognized by prefix matching of R  is:

L(p(R)) = prefix(L(R)) = {u : uv in L(R) for some v}

Trying to come up (by hand or algorithmically) with a regex R' such that
L(R') L(p(R)) is awkward and inefficient, eg:

R='hello';
R'=`|h|he|hell|hello` = `(h(e(l(l(o)?)?)?)?)?`;

However thinking in terms of state machine this is much easier and
efficient.

It looks like this:
assert("hel".match(`hello\d+`.regex("p")); //p for prefix match

If there's interest in adding this I can prepare a pull request and we can
discuss more.

Example use case:
I wrote a function to search a file given a regex, and it is optimized to
prune directories early on if they fail to prefix match the regex, eg:

dirEntriesOptimized(`abc/folder_\d+/\w+\.cpp`)
when encountering `abc/bad_subfolder/` it will not recurse on this as it
fails the prefix regex match.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20140618/77fcfa64/attachment.html>


More information about the Digitalmars-d mailing list