Extracting Structure from HTML using Adam's dom.d

Adam D. Ruppe via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Jan 21 18:06:15 PST 2015


On Wednesday, 21 January 2015 at 23:31:26 UTC, Nordlöw wrote:
> This means that I need some kind of interface to extract all 
> the contents of each <p> paragraph that is preceeded by a <h2> 
> heading with a specific id (say "H2_A") or content (say "More 
> important"). How do I accomplish that?

You can do that with a CSS selector like:

document.querySelector("#H2_A + p");

or even document.querySelectorAll("h2 + p") to get every P 
immediately following a h2.


My implementation works mostly the same as in javascript so you 
can read more about css selectors anywhere on the net like 
https://developer.mozilla.org/en-US/docs/Web/API/Document.querySelector

> Further, is there a way to extract the "contents" only of an 
> Element instance, that is  "Stuff" from "<p>Stuff</p>" for each 
> Element in the return of for example getElementsByTagName(`p`)?

Element.innerText returns all the plain text inside with all tags 
stripped out (same as the function in IE)

Element.innerHTML returns all the content inside, including tags 
(same as the function in all browsers)

Element.firstInnerText returns all the text up to the first tag, 
but then stops there. (this is a custom extension)


You can call those in a regular foreach loop or with something 
like std.algorithm.map to get the info from an array of elements.


More information about the Digitalmars-d-learn mailing list