simple sax-style xml parser

Chris via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Fri Jul 29 07:47:08 PDT 2016


On Wednesday, 20 July 2016 at 01:49:37 UTC, ketmar wrote:
> i wrote a simple sax-style xml parser[1][2] for my own needs, 
> and decided to share it. it has two interfaces: `xmparse()` 
> function which simply calls callbacks without any validation or 
> encoding conversion, and `SaxyEx` class, which does some 
> validation, converts content to utf-8 (from anything 
> std.encoding supports), and calls callbacks when the given path 
> is triggered.
>
> it can parse any `char` input range, or std.stdio.File. parsing 
> files is probably slightly faster than parsing ranges.
>
> internally it is extensively reusing memory buffers it 
> allocated, so it should not create a big pressure on GC.
>
> you are expected to copy any data you need in callbacks (not 
> just slice, but .dup!).
>
> so far i'm using it to parse fb2 files, and it parsing 8.5 
> megabyte utf-8 file (and creating internal reader structures, 
> including splitting text to words and some other housekeeping) 
> in one second on my i3 (with dmd -O, even without -inline and 
> -release).
>
> it is not really documented, but i think it is "intuitive". 
> there are also some comments in source code; please, read 
> those! ;-)
>
> p.s. it decodes standard xml entities (&# and &#x probably 
> works right only in utf-8 files, though), understands CDATA and 
> comments.
>
>
> enjoy, and happy hacking!
>
>
> [1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
> [2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests

Thanks. I might actually use it. I need an XML parser and wrote a 
very basic and incomplete one for my needs.


More information about the Digitalmars-d-announce mailing list