Restrictions in std.regexp?
Olaf Pohlmann
op at nospam.org
Tue May 2 08:10:38 PDT 2006
Derek Parnell wrote:
> Are you looking for an optional "AB" followed by "CD" followed by an
> optional "EF" ?
No. I'm looking for a string that is preceeded and followed by well
defined other strings. The match should *not* return the whole sequence
but only what is in the middle. It's actually about parsing some kind of
text markup. If it was html like "<body><h1>Welcome</h1></body>" it
should allow me to retrieve only the "Welcome". If you just use some
grouping the match will be the whole <h1> element, so you have to
extract the content in a 2nd step. The regexp with lookahead and
lookbehind works fine in Python:
import re
html = "<body>\n<h1>Welcome</h1>\n</body>"
match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html)
html[m.start():m.end()]
This prints 'Welcome'.
The regexp is a bit hard to read, so see
http://docs.python.org/lib/re-syntax.html for a description.
Now, I can retrieve the whole h1 element with the D version of regexps
and then do another scan for the content but it would be nice to get it
in one step, like in the Python version.
op
More information about the Digitalmars-d-learn
mailing list