Can I parse this kind of HTML with arsd.dom module?

Timoses timosesu at gmail.com
Sun Jun 24 10:49:51 UTC 2018


On Sunday, 24 June 2018 at 03:46:09 UTC, Dr.No wrote:
> 	string html = get(page, client).text;
> 	auto document = new Document();
> 	document.parseGarbage(html);
> Element attEle = document.querySelector("span[id=link2]");
> 	Element aEle = attEle.querySelector("a");
> string link = aEle.href; // <-- if the href contains space, it 
> return "href" rather the link
>
> [...]
>
> <body bgcolor="#000000">
> <font color="yellow">
> <h2>
> 	Hello, dear world!
> 	<span id="link2">
> <a href = "https://hostname.com/?file=foo.png&foo=baa">G!</a>
> 	</span>
> </h2>
> </font>
missing </body>

Seems to be buggy, the parsed document part refering to "a" looks 
like this:

<a "https:=""https:" href="href" />G!




More information about the Digitalmars-d-learn mailing list