HTML Parsing lib

Adam D. Ruppe via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Oct 25 13:19:30 PDT 2014


Another option for html is my dom.d

https://github.com/adamdruppe/arsd

get dom.d and characterencodings.d in your project directory.

compile with dmd yourfile.d dom.d characterencodings.d

here's an example:

import arsd.dom;

void main() {
    auto document = new Document();

    // The example document will be defined inline here
    // We could also load the string from a file with
    // std.file.readText or the web with std.net.curl.get
    document.parseGarbage(`<html><head>
      <meta name="author" content="Adam D. Ruppe">
      <title>Test Document</title>
    </head>
    <body>
      <p>This is the first paragraph of our <a
href="test.html">test document</a>.
      <p>This second paragraph also has a <a
href="test2.html">link</a>.
      <p id="custom-paragraph">Old text</p>
    </body>
    </html>`);

    import std.stdio;
    // retrieve and print some meta information
    writeln(document.title);
    writeln(document.getMeta("author"));
    // show a paragraph’s text
    writeln(document.requireSelector("p").innerText);
    // modify all links
    document["a[href]"].setValue("source", "your-site");
    // change some html
    document.requireElementById("custom-paragraph").innerHTML =
"New <b>HTML</b>!";
    // show the new document
    writeln(document.toString());
}




You can replace the html string with something like
std.file.readText("yourfile.html"); too


My library is meant to give an api similar to javascript.


I don't use dub so idk about how to use that, I just recommend
adding my files to your project if you wanna try it.


More information about the Digitalmars-d-learn mailing list