New XML parser written for D1 and D2.

Michael Rynn michaelrynn at optusnet.com.au
Wed Oct 14 03:35:12 PDT 2009



I have made a validating or optionally none validating XML parser in
D.

It can read and parse  files and external dtds and entities with
differrent BOM and encodings.

This xmlp (XmlPieceParser class) passes 100% on both validating and
non-validating modes for the following test sets:- oasis, sun, xmltes
and  ibm.  I have not dared to try any of the xml 1.1 or other tests.
The warnings given by, if you choose to intercept them,  for not
well-formed or  non-valid documents may not necessarily be
illuminating.

My brief try of a modified std.xml against some of these tests led me
to chuck it, as I learned more what the parser is actually supposed to
do.  This one is all my own mistakes and bad coding habits, written
from near scratch, after giving up on std.xml, and taking what I could
from std.encoding.
I have also made a front end xmlp.delegator module that emulates the
delagate callback model of std.xml.

To use, you need to have a class derived from  XmlParserInput, of
which there are two instances, StreamParserInput and StringParserInput
in xmlp.input.   These wrap an InputRange interface (empty, front,
popFront).  the bool validate flag is false by default.

Give a new XmlPieceParser the input, and an optional base directory
path, and call nextPiece()  repeatedly to get bits of the rather
sparse  XmlTree model  defined in xmlp.xmldom. Or call the static
XmlPieceParser.ReadDocument to get the entire thing at once.

 This parser should be adaptable to use with Tango, as there is only
minimal dependence on Phobos.
  I dont know how the Tango xml parser would cope with the w3c tests.
Any resemblance of this to the Tango xml parser will be pure
coincidence, as a brief glance at the Tango code some long time ago
left me none wiser.

I learnt a lot of XML minutiae while getting it to parse the hundreds
of w3c test cases.  I've included the conformance test program and
scripts as one of the examples.

Some validation, such as the ELEMENT content particles validater still
has wet glue and cement, and is not gauranteed to validate each and
every  deterministic  content model. 

I am sure this release will be considered to be code bloated at the
moment.  With all those test cases, some conditional coding and
variants became a bit too contrived.  After coding it for while it
just got too big. Alhough I do think I got better at it towards the
end.  There is some scope for shrinkage. The windows binary with D2 of
the XmlConformance test suite runner is 

Very possibly there is a non-validating parser inside that is a fair
bit smaller that this, that could one day be created by conditional
compiled or re-coded from it. 

The package has a base module name of xmlp.  I am not aiming for
std.xml as yet.  

There are of course lots of other things in the XML world,
schemas,relax-ng  xsl xpointer and xpath,  and this parser almost
brings us to this century.

But I would like to have it made available so others can test.

Where and to whom can I post the 56 KB source code zip?

---------------------
Michael Rynn



More information about the Digitalmars-d mailing list