Class for fetching a web page and parse into DOM

Adam D. Ruppe destructionator at
Thu Dec 15 06:49:13 PST 2011

On Thursday, 15 December 2011 at 09:55:22 UTC, breezes wrote:
> Is there a class that can fetch a web page from the internet? 
> And is std.xml the right module for parsing it
> into a DOM tree?

You might want to use my dom.d

Grab dom.d, characterencodings.d, and curl.d.

Here's an example program:

import arsd.dom;
import arsd.curl;

import std.stdio;

void main() {
	auto document = new Document();


Compile like this:

dmd yourfile.d dom.d characterencodings.d curl.d

You'll need the curl C library from an outside source. If you're
on Linux, it is probably already installed. If you're on Windows,
check the Internet.

// this downloads a file from the web and returns a string
curl(site url);

// this builds a DOM tree out of html. It's called parseGarbage 
// it tries to figure out really bad html - so it works on a lot 
of web
// sites.

// My dom.d includes a lot of functions you might know from
// javascript like getElementById, getElementsByTagName, and the
// get element by CSS selector functions
document.querySelector("p") // get the first paragraph

And then, finally, the writeln puts out the html of an element.

More information about the Digitalmars-d-learn mailing list