How to parse epub content
Adam D. Ruppe
destructionator at gmail.com
Sat Jan 11 13:53:20 UTC 2020
On Saturday, 11 January 2020 at 12:38:38 UTC, Adnan wrote:
> How would someone approach parsing epub files in D? Is there
> any libraries to parse XHTML?
I've done it before with my dom.d easily enough.
The epub itself is a zip file. You might simply unzip it ahead of
time, or use std.zip to access the contents easily enough. (basic
zip file support is in phobos).
Then once you get inside there's xhtml files which again are easy
enough to parse. Like with my dom.d it is as simple as like
import arsd.dom;
// the true,true here tells it to use strict xml mode for xhtml
// isn't really necessary though so it is ok
auto document = new Document(string_holding_xml, true, true);
foreach(ele; document.querySelectorAll("p"))
writeln(ele.innerText);
the api there is similar to javascript if you're familiar with
that.
More information about the Digitalmars-d-learn
mailing list