std.xml validity checking is absurd
Stewart Gordon
smjg_1998 at yahoo.com
Thu Feb 7 14:22:05 PST 2013
Apologies if this has been talked about before. I haven't been able to find it by a quick
search of the 'group. Apologies also if what I'm saying is already taken care of in the
module that's being drafted as a replacement for std.xml.
This is what I've found: Validity checking is done in an in contract!
This is saying it's _illegal_ to construct a Document or DocumentParser from invalid XML,
not just that it's an error condition. This is contrary to the spirit of DBC and
exception handling. DBC is for detecting program bugs. Exception handling is for dealing
with unexpected conditions at runtime.
99% of the time, XML data will come from a file or an external process, rather than being
hard-coded in the program or even generated by and passed from another part of the
program. As such, invalid XML input is an unexpected condition at runtime, verifying that
the XML is syntactically and structurally valid is part of what the program needs to do.
An invalid XML file is not a sign of a bug in a program - indeed, the failure to detect
the invalidity of an XML file is.
This means that rather than just
Document data = new Document(xml);
you need to do
check(xml);
Document data = new Document(xml);
Consequently, in a development build the XML is parsed three times:
- first, through the call to the check function here
- then, when check is called again in DocumentParser's constructor's in contract
- and finally, in the body of the Document constructor as it is actually building the DOM.
This shouldn't be necessary. Validity should be checked automatically while parsing the
XML to build the DOM. This would mean that the XML is parsed only once, which is much
more efficient as well as being a first step towards enabling the XML to be read from a
stream and parsed on the fly.
And it should throw a normal exception if it fails, not an assertion failure. I haven't
taken the time to figure out what actually does happen if malformed XML is passed in in a
release build. But I don't suppose that it errors out gracefully in the general case.
Anyway ... is there going to be a fix for this, or is it just a case of waiting for the
replacement for std.xml and using the workaround or an alternative XML parser in the meantime?
Stewart.
More information about the Digitalmars-d
mailing list