Question about using regex

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Mar 21 10:22:53 PDT 2012


On 21.03.2012 21:13, Dmitry Olshansky wrote:
> On 21.03.2012 20:05, James Oliphant wrote:
>> While following the regex discussion, I have been compiling the examples
>> to help with my understanding of how it works.
>>
>> From Dmitry's example page:
>> http://blackwhale.github.com/regular-expression.html
>> and from the dlang.org website:
>> http://dlang.org/phobos/std_regex.html
>>
>> std.regex.replace calls a delegate
>> auto delegate(Captures!string)
>> which does not compile. The definition in Phobos for Captures is
>> struct Captures(R,DIndex)
>> and for the purposes of these examples changing the delegate to
>> auto delegate(Captures!(string,uint))
>> seems to work. Is this correct?
>>
>
> Mm-hm it means the fix to use size_t by default is in upstream, but not
> in 2.058 I think. User needs not to specify index type, this is a hook
> for future extension.
>
>>
>> In another example on Dmitry's page that starts:
>> auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3
>> "word" symbols
>> The output from the example is "Ranges, R, s", but I don't quite
>> understand why those where the matches in this case.
>
>
> Ok, \w matches any single word character, that is alpha, numeric or one
> of few other oddities*.
> Now (\w) captures 1 character into 1st _submatch_ ('R').
> \w* captures the rest the gets reverted so that the next (\w) matches
> The second (\w) thus captures last char ('s') into 2nd _submatch_
> captures lists submatches captured during one match, [0] is the whole
> match.
>
> I get it that people tend to think that I was about to show multiple
> _matches_ here, but that belongs to the next chapter. Here I was just
> showing how to work with submatches, that needs to be stressed somehow.
>

Oh wait, it's in this chapter :) I probably should make more noise about 
"g" flag, and separate submatches from range of matches more cleanly.

>
> *This is enormously useful tool to get info on unicode stuff and regex
> in particular
> http://unicode.org/cldr/utility/index.jsp
>
>
> Also does the
>> regular expression imply match at least 2 "word" symbols where \w* means
>> match 0 or more "word" symbols?
>
> Yup, that's right at least 2, I should correct wording.
>
>>
>> These newsgroups are a great resource, keep up the great work!
>
> You are welcome.
>


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list