My simple implementation of PHP strip_tags()

Ola Fosheim Grøstad via Digitalmars-d digitalmars-d at puremagic.com
Mon Jul 3 01:29:15 PDT 2017


On Thursday, 29 June 2017 at 05:30:28 UTC, Patrick Schluter wrote:
> Ouch, parsing html or xml with regular expressions is 
> problematic.
> What people generally don't realize is that the > is not 
> required to be encoded as entity when in the data. This means 
> that <thing attr="Hello >"> or
> <data>></data> are absolutely legal. Regular expressions may 
> break when they encounter them.

Yes, and that is only the beginning: "<" is also legal inside a 
CDATA section and elements can be encoded as entities and 
therefore be hidden in the main text. I'm sure there are more 
gotchas. So, if you parse xml, use a real xml parser.



More information about the Digitalmars-d mailing list