My simple implementation of PHP strip_tags()

Patrick Schluter via Digitalmars-d digitalmars-d at puremagic.com
Wed Jun 28 22:30:28 PDT 2017


On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
> I wanted strip_tags() for sanitization in vibe.d and I set out 
> for algorithms on how to do it and came across this JavaScript 
> library at
>
> string stripTags(string input, in string[] allowedTags = [])
> {
> 	import std.regex: Captures, replaceAll, ctRegex;
>
> 	auto regex = ctRegex!(`</?(\w*)>`);
>
Ouch, parsing html or xml with regular expressions is problematic.
What people generally don't realize is that the > is not required 
to be encoded as entity when in the data. This means that <thing 
attr="Hello >"> or
<data>></data> are absolutely legal. Regular expressions may 
break when they encounter them.

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/




More information about the Digitalmars-d mailing list