prefix match of a regex and optimized dirEntries for regex search

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Wed Jun 18 12:37:11 PDT 2014


18-Jun-2014 21:38, Timothee Cour via Digitalmars-d пишет:
> I made a simple modification to std.regex to allow an option to prefix
> match a regex.
> Formally, if L(R) is the language recognized by a regex R, the language
> recognized by prefix matching of R  is:
>
> L(p(R)) = prefix(L(R)) = {u : uv in L(R) for some v}
> Trying to come up (by hand or algorithmically) with a regex R' such that
> L(R') L(p(R)) is awkward and inefficient, eg:
>
So essentially it's just a set of all partial matches?

> R='hello';
> R'=`|h|he|hell|hello` = `(h(e(l(l(o)?)?)?)?)?`;
>
> However thinking in terms of state machine this is much easier and
> efficient.
>
> It looks like this:
> assert("hel".match(`hello\d+`.regex("p")); //p for prefix match
>

Well, since "p" would apply to operation rather then regex I think 
adding a partialMatch function would be simpler.

Plus it may need to return extended information like where in the 
pattern it stopped, so the return type/API may need to be different from 
normal match.

> If there's interest in adding this I can prepare a pull request and we
> can discuss more.
>
Sure partial matching wold be nice to have and any help in this 
direction is warmly welcome.

Would be cool if you could take a look at
https://github.com/D-Programming-Language/phobos/pull/2216

It's nothing special except splitting up std.regex into a package but 
it's a stepping stone of going forward with it.

> Example use case:
> I wrote a function to search a file given a regex, and it is optimized
> to prune directories early on if they fail to prefix match the regex, eg:

I'm not sure I follow - a normal regex match would do essentially the 
same by first checking the prefix and failing on wrong ones. Where is 
optimization?

What could be interesting is to limit matching to not go beyond say 80 
code units and report if it's (partial) match of given pattern.
>
> dirEntriesOptimized(`abc/folder_\d+/\w+\.cpp`)
> when encountering `abc/bad_subfolder/` it will not recurse on this as it
> fails the prefix regex match.



-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list