simple sax-style xml parser
ketmar via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Tue Jul 19 18:49:37 PDT 2016
i wrote a simple sax-style xml parser[1][2] for my own needs, and
decided to share it. it has two interfaces: `xmparse()` function
which simply calls callbacks without any validation or encoding
conversion, and `SaxyEx` class, which does some validation,
converts content to utf-8 (from anything std.encoding supports),
and calls callbacks when the given path is triggered.
it can parse any `char` input range, or std.stdio.File. parsing
files is probably slightly faster than parsing ranges.
internally it is extensively reusing memory buffers it allocated,
so it should not create a big pressure on GC.
you are expected to copy any data you need in callbacks (not just
slice, but .dup!).
so far i'm using it to parse fb2 files, and it parsing 8.5
megabyte utf-8 file (and creating internal reader structures,
including splitting text to words and some other housekeeping) in
one second on my i3 (with dmd -O, even without -inline and
-release).
it is not really documented, but i think it is "intuitive". there
are also some comments in source code; please, read those! ;-)
p.s. it decodes standard xml entities (&# and &#x probably works
right only in utf-8 files, though), understands CDATA and
comments.
enjoy, and happy hacking!
[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests
More information about the Digitalmars-d-announce
mailing list