Restrictions in std.regexp?

Olaf Pohlmann op at nospam.org
Tue May 2 08:10:38 PDT 2006


Derek Parnell wrote:
> Are you looking for an optional "AB" followed by "CD" followed by an  
> optional "EF" ?

No. I'm looking for a string that is preceeded and followed by well 
defined other strings. The match should *not* return the whole sequence 
but only what is in the middle. It's actually about parsing some kind of 
text markup. If it was html like "<body><h1>Welcome</h1></body>" it 
should allow me to retrieve only the "Welcome". If you just use some 
grouping the match will be the whole <h1> element, so you have to 
extract the content in a 2nd step. The regexp with lookahead and 
lookbehind works fine in Python:

import re
html = "<body>\n<h1>Welcome</h1>\n</body>"
match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html)
html[m.start():m.end()]

This prints 'Welcome'.

The regexp is a bit hard to read, so see 
http://docs.python.org/lib/re-syntax.html for a description.

Now, I can retrieve the whole h1 element with the D version of regexps 
and then do another scan for the content but it would be nice to get it 
in one step, like in the Python version.


op



More information about the Digitalmars-d-learn mailing list