For those ready to take the challenge

Adam D. Ruppe via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Jan 10 12:43:45 PST 2015


On Saturday, 10 January 2015 at 19:17:22 UTC, Ola Fosheim Grøstad 
wrote:
> Nice and clean code; does it expand html entities ("&amp")?

Of course. It does it both ways:

<span>a &</span>

span.innerText == "a &"

span.innerText = "a \" b";
assert(span.innerHTML == "a " b");

parseGarbage also tries to fix broken entities, so like & 
standing alone it will translate to & for you. there's also 
parseStrict which just throws an exception in cases like that.

That's one thing a lot of XML parsers don't do in the name of 
speed, but I do since it is pretty rare that I don't want them 
translated. One thing I did for a speedup though was scan the 
string for & and if it doesn't find one, return a slice of the 
original, and if it does, return a new string with the entity 
translated. Gave a surprisingly big speed boost without costing 
anything in convenience.

> The HTML5 standard has improved on HTML4 by now being explicit 
> on how incorrect documents shall be interpreted in section 8.2. 
> That ought to be sufficient, since that is what web browsers 
> are supposed to do.
>
> http://www.w3.org/TR/html5/syntax.html#html-parser

Huh, I never read that, my thing just did what looked right to me 
over hundreds of test pages that were broken in various strange 
and bizarre ways.


More information about the Digitalmars-d-learn mailing list