regarding Latin1 to UTF8 encoding

Adam D. Ruppe destructionator at gmail.com
Sun Dec 8 19:50:40 PST 2013


On Monday, 9 December 2013 at 03:33:46 UTC, Hugo Florentino wrote:
> Coud this work using scope instead of try/catch?

Maybe, but I don't think it would be very pretty. Really, I think 
validate should return a bool instead of throwing, but since it 
doesn't the try/catch is as close as it gets.

> P.S. Nice unit, by the way.

BTW if you need to parse random html, grab that file and my dom.d 
from the same repo.

auto document = new Document();
document.parseGarbage(whatever_data);

parseGarbage tries to determine the character encoding 
automatically, from the validate check or the meta tags in the 
HTML if they are there, then guessing if not. It is pretty good 
at parsing broken html tag soup to make a dom similar to the 
browser.

Then you can get data out of it doing things like

auto firstParagraph = document.querySelector("p:first-child");
if(firstParagraph is null) writeln("no first child paragraph");
else writeln("first child paragraph text: ", 
firstParagraph.innerText);

and stuff like that, if you have used Javascript before dom.d 
should look fairly familiar.


More information about the Digitalmars-d-learn mailing list