simple sax-style xml parser

ketmar via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Tue Jul 19 18:49:37 PDT 2016


i wrote a simple sax-style xml parser[1][2] for my own needs, and 
decided to share it. it has two interfaces: `xmparse()` function 
which simply calls callbacks without any validation or encoding 
conversion, and `SaxyEx` class, which does some validation, 
converts content to utf-8 (from anything std.encoding supports), 
and calls callbacks when the given path is triggered.

it can parse any `char` input range, or std.stdio.File. parsing 
files is probably slightly faster than parsing ranges.

internally it is extensively reusing memory buffers it allocated, 
so it should not create a big pressure on GC.

you are expected to copy any data you need in callbacks (not just 
slice, but .dup!).

so far i'm using it to parse fb2 files, and it parsing 8.5 
megabyte utf-8 file (and creating internal reader structures, 
including splitting text to words and some other housekeeping) in 
one second on my i3 (with dmd -O, even without -inline and 
-release).

it is not really documented, but i think it is "intuitive". there 
are also some comments in source code; please, read those! ;-)

p.s. it decodes standard xml entities (&# and &#x probably works 
right only in utf-8 files, though), understands CDATA and 
comments.


enjoy, and happy hacking!


[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests


More information about the Digitalmars-d-announce mailing list