My simple implementation of PHP strip_tags()
Ola Fosheim Grøstad via Digitalmars-d
digitalmars-d at puremagic.com
Mon Jul 3 01:29:15 PDT 2017
On Thursday, 29 June 2017 at 05:30:28 UTC, Patrick Schluter wrote:
> Ouch, parsing html or xml with regular expressions is
> problematic.
> What people generally don't realize is that the > is not
> required to be encoded as entity when in the data. This means
> that <thing attr="Hello >"> or
> <data>></data> are absolutely legal. Regular expressions may
> break when they encounter them.
Yes, and that is only the beginning: "<" is also legal inside a
CDATA section and elements can be encoded as entities and
therefore be hidden in the main text. I'm sure there are more
gotchas. So, if you parse xml, use a real xml parser.
More information about the Digitalmars-d
mailing list