regex issue

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Mar 19 06:55:37 PDT 2012


On 19.03.2012 17:39, Jay Norwood wrote:
> On Monday, 19 March 2012 at 13:27:03 UTC, Jay Norwood wrote:
>> ok, global. So the document implies that I should be able to get a
>> single match object with a count of the submatches. So I think maybe
>> I've jumped to the wrong conclusion about how to use it, thinking I
>> could just use "\n" and "g" flag got get all the matches for the range
>> of "\n". So it looks like instead that the term "submatches" needs
>> more explanation. What exactly constitutes a submatch? I infered it
>> just meant any single match among many.
>>
>> //create static regex at compile-time, contains fast native code
>> enum ctr = ctRegex!(`^.*/([^/]+)/?$`);
>>
>> //works just like normal regex:
>> auto m2 = match("foo/bar", ctr); //first match found here if any
>> assert(m2); // be sure to check if there is a match, before examining
>> contents!
>> assert(m2.captures[1] == "bar");//captures is a range of submatches, 0
>> - full match
>>
>>
>> btw, I couldn't get this \p option to work for the uni properties. Can
>> you provide some example of that which works?
>>
>> \p{PropertyName} Matches character that belongs to unicode
>> PropertyName set. Single letter abreviations could be used without
>> surrounding {,}.
>
>
> so, to answer my own question, it appears that the (regex) is the
> portion that is considered a submatch that gets counted.
>
> so counting lines would be something that has a (\n) in it, although
> I'll have to figure out what that will be exactly.

That's right, however counting is completely separate from regex, you'd 
want to use std.algorithm count:
count(match(....,"\n"));

or more unicode-friendly:
count(match(...., regex("$","m")); //note the multi-line flag

Also observe that there is simply no way to get more then constant 
number of submatches.

>
> (regex) Matches subexpression regex, saving matched portion of text for
> later retrival.
>

An example of unicode properties:
\p{WhiteSpace} matches any unicode whitespace char


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list