std.xml validity checking is absurd

Stewart Gordon smjg_1998 at yahoo.com
Thu Feb 7 14:22:05 PST 2013


Apologies if this has been talked about before.  I haven't been able to find it by a quick 
search of the 'group.  Apologies also if what I'm saying is already taken care of in the 
module that's being drafted as a replacement for std.xml.

This is what I've found: Validity checking is done in an in contract!

This is saying it's _illegal_ to construct a Document or DocumentParser from invalid XML, 
not just that it's an error condition.  This is contrary to the spirit of DBC and 
exception handling.  DBC is for detecting program bugs.  Exception handling is for dealing 
with unexpected conditions at runtime.

99% of the time, XML data will come from a file or an external process, rather than being 
hard-coded in the program or even generated by and passed from another part of the 
program.  As such, invalid XML input is an unexpected condition at runtime, verifying that 
the XML is syntactically and structurally valid is part of what the program needs to do. 
An invalid XML file is not a sign of a bug in a program - indeed, the failure to detect 
the invalidity of an XML file is.

This means that rather than just

Document data = new Document(xml);

you need to do

check(xml);
Document data = new Document(xml);

Consequently, in a development build the XML is parsed three times:
- first, through the call to the check function here
- then, when check is called again in DocumentParser's constructor's in contract
- and finally, in the body of the Document constructor as it is actually building the DOM.

This shouldn't be necessary.  Validity should be checked automatically while parsing the 
XML to build the DOM.  This would mean that the XML is parsed only once, which is much 
more efficient as well as being a first step towards enabling the XML to be read from a 
stream and parsed on the fly.

And it should throw a normal exception if it fails, not an assertion failure.  I haven't 
taken the time to figure out what actually does happen if malformed XML is passed in in a 
release build.  But I don't suppose that it errors out gracefully in the general case.

Anyway ... is there going to be a fix for this, or is it just a case of waiting for the 
replacement for std.xml and using the workaround or an alternative XML parser in the meantime?

Stewart.


More information about the Digitalmars-d mailing list