My simple implementation of PHP strip_tags()
Patrick Schluter via Digitalmars-d
digitalmars-d at puremagic.com
Wed Jun 28 22:30:28 PDT 2017
On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
> I wanted strip_tags() for sanitization in vibe.d and I set out
> for algorithms on how to do it and came across this JavaScript
> library at
>
> string stripTags(string input, in string[] allowedTags = [])
> {
> import std.regex: Captures, replaceAll, ctRegex;
>
> auto regex = ctRegex!(`</?(\w*)>`);
>
Ouch, parsing html or xml with regular expressions is problematic.
What people generally don't realize is that the > is not required
to be encoded as entity when in the data. This means that <thing
attr="Hello >"> or
<data>></data> are absolutely legal. Regular expressions may
break when they encounter them.
http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
More information about the Digitalmars-d
mailing list