dxml behavior after exception: continue parsing

Jonathan M Davis newsgroup.d at jmdavisprog.com
Mon May 7 22:24:25 UTC 2018


On Monday, May 07, 2018 19:46:00 Jesse Phillips via Digitalmars-d-learn 
wrote:
> So I have an XML like document which fails to adhere completely
> to XML. One of these such events is that & is used without
> escaping.
>
> My observation is that after the exception it is possible to move
> to the next element without issue. Is this something expected and
> will be maintained?
>
>
>      try {
>          range.popFront();
>      } catch (Exception e) {
>          range.popFront;
>      }

The documentation on EntityRange / parseXML specifically states:

"If invalid XML is encountered at any point during the parsing process, an
XMLParsingException will be thrown. If an exception has been thrown, then
the parser is in an invalid state, and it is an error to call any functions
on it."

What happens if you continue parsing after an exception is effectively
undefined behavior and could vary wildly depending on what was invalid in
the XML and which part of the parser threw. It may very well be that in some
circumstances, you would be able to continue parsing without any real
negative side effects, but the parser could also end up asserting or doing
who-knows-what, because it's not in a valid state. I could add a member to
the parser which says whether it's in a valid state or not an then have the
parser throw if you try to call anything on it after an exception has been
thrown, but that would add overhead that I'd rather avoid. At most, such a
check would be done with assertions like the checks for whether you're
allowed to call name, text, or attributes are assertions.

I've been considering adding more configuration options where you say
something like you don't care if any invalid characters are encountered, in
which case, you could cleanly parse past something like an unescaped &, but
you'd then potentially be operating on invalid XML without knowing it and
could get undesirable results depending on what exactly is wrong with the
XML. I haven't decided for sure whether I'm going to add any such
configuration options or how fine-grained they'd be, but either way, the
current behavior will continue to be the default behavior.

- Jonathan M Davis



More information about the Digitalmars-d-learn mailing list