Restrictions in std.regexp?

Tue May 2 08:10:38 PDT 2006

Derek Parnell wrote:
> Are you looking for an optional "AB" followed by "CD" followed by an  
> optional "EF" ?

No. I'm looking for a string that is preceeded and followed by well 
defined other strings. The match should *not* return the whole sequence 
but only what is in the middle. It's actually about parsing some kind of 
text markup. If it was html like "<body><h1>Welcome</h1></body>" it 
should allow me to retrieve only the "Welcome". If you just use some 
grouping the match will be the whole <h1> element, so you have to 
extract the content in a 2nd step. The regexp with lookahead and 
lookbehind works fine in Python:

import re
html = "<body>\n<h1>Welcome</h1>\n</body>"
match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html)
html[m.start():m.end()]

This prints 'Welcome'.

The regexp is a bit hard to read, so see 
http://docs.python.org/lib/re-syntax.html for a description.

Now, I can retrieve the whole h1 element with the D version of regexps 
and then do another scan for the content but it would be nice to get it 
in one step, like in the Python version.

op