Extracting Structure from HTML using Adam's dom.d

"Nordlöw" via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Jan 21 15:31:25 PST 2015


I'm trying to figure out how to most easily extract structured 
information using Adam D Ruppe's dom.d.

Typically I want the following HTML example

...
<h2> <span class="mw-headline" id="H2_A">More important</span> 
</h2>
<p>This is <i>important</i>.</p>
<h2> <span class="mw-headline" id="H2_B">Less important</span> 
</h2>
<p>This is not important.</p>
...

to be reduced to

This is <i>important</i>.

This means that I need some kind of interface to extract all the 
contents of each <p> paragraph that is preceeded by a <h2> 
heading with a specific id (say "H2_A") or content (say "More 
important"). How do I accomplish that?

Further, is there a way to extract the "contents" only of an 
Element instance, that is  "Stuff" from "<p>Stuff</p>" for each 
Element in the return of for example getElementsByTagName(`p`)?


More information about the Digitalmars-d-learn mailing list