Learning to XML with D
Derix via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Feb 9 03:54:43 PST 2015
> my dom.d works in a familiar way
OK, will check it
> useful for scraping html sites.
Not exactly what I'm doing, but close. I'm in the midst of a
self-training spree, and what I use as test-tubes fodder is the
following : a collection of 300+ html files constituting an
electronic version of a technical book. My intent is to generate
a clickable table of contents, by parsing the files for css
styles specific to section headers. The first leg of the journey
was to normalize styles accross the bunch. That is done, more or
less. I already have a proto-toc, but not entirely satisfying :
lacks handles for propper styling, and the way I arrived there is
kinda brutish.
One hurdle I haven't overcame yet is that the text content, and
the section headers themsleves, contain some html tags (well, the
book /is/ about html, among other things). For example, some
section headers are rendered as two bold lines, with a fat <br/>
in the middle, and <b></b> around. So when I parse the payload of
the <p> element, I end up with some <br/> in the middle of
a sentence. Survivable, but unclean.
So yeah, I'll give it another try with your dom.d
More information about the Digitalmars-d-learn
mailing list