Question about using regex

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Mar 21 10:13:33 PDT 2012


On 21.03.2012 20:05, James Oliphant wrote:
> While following the regex discussion, I have been compiling the examples
> to help with my understanding of how it works.
>
>  From Dmitry's example page:
> 	http://blackwhale.github.com/regular-expression.html
> and from the dlang.org website:
> 	http://dlang.org/phobos/std_regex.html
>
> std.regex.replace calls a delegate
> 	auto delegate(Captures!string)
> which does not compile.  The definition in Phobos for Captures is
> 	struct Captures(R,DIndex)
> and for the purposes of these examples changing the delegate to
> 	auto delegate(Captures!(string,uint))
> seems to work.  Is this correct?
>

Mm-hm it means the fix to use size_t by default is in upstream, but not 
in 2.058 I think. User needs not to specify index type, this is a hook 
for future extension.

>
> In another example on Dmitry's page that starts:
> 	auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3
> "word" symbols
> The output from the example is "Ranges, R, s", but I don't quite
> understand why those where the matches in this case.


Ok, \w matches any single word character, that is alpha, numeric or one 
of few other oddities*.
Now (\w) captures 1 character into 1st _submatch_ ('R').
\w* captures the rest the gets reverted so that the next (\w) matches
The second (\w) thus captures last char ('s') into 2nd _submatch_
captures lists submatches captured during one match, [0] is the whole match.

I get it that people tend to think that I was about to show multiple 
_matches_ here, but that belongs to the next chapter. Here I was just 
showing how to work with submatches, that needs to be stressed somehow.


*This is enormously useful tool to get info on unicode stuff and regex 
in particular
http://unicode.org/cldr/utility/index.jsp


Also does the
> regular expression imply match at least 2 "word" symbols where \w* means
> match 0 or more "word" symbols?

Yup, that's right at least 2, I should correct wording.

>
> These newsgroups are a great resource, keep up the great work!

You are welcome.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list