Extracting Structure from HTML using Adam's dom.d
"Nordlöw" via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Jan 21 15:31:25 PST 2015
I'm trying to figure out how to most easily extract structured
information using Adam D Ruppe's dom.d.
Typically I want the following HTML example
...
<h2> <span class="mw-headline" id="H2_A">More important</span>
</h2>
<p>This is <i>important</i>.</p>
<h2> <span class="mw-headline" id="H2_B">Less important</span>
</h2>
<p>This is not important.</p>
...
to be reduced to
This is <i>important</i>.
This means that I need some kind of interface to extract all the
contents of each <p> paragraph that is preceeded by a <h2>
heading with a specific id (say "H2_A") or content (say "More
important"). How do I accomplish that?
Further, is there a way to extract the "contents" only of an
Element instance, that is "Stuff" from "<p>Stuff</p>" for each
Element in the return of for example getElementsByTagName(`p`)?
More information about the Digitalmars-d-learn
mailing list