dxml behavior after exception: continue parsing

Jonathan M Davis newsgroup.d at jmdavisprog.com
Mon May 7 22:34:55 UTC 2018


On Monday, May 07, 2018 22:16:58 Jesse Phillips via Digitalmars-d-learn 
wrote:
> On Monday, 7 May 2018 at 19:46:00 UTC, Jesse Phillips wrote:
> > So I have an XML like document which fails to adhere completely
> > to XML. One of these such events is that & is used without
> > escaping.
> >
> > My observation is that after the exception it is possible to
> > move to the next element without issue. Is this something
> > expected and will be maintained?
> >
> >     try {
> >
> >         range.popFront();
> >
> >     } catch (Exception e) {
> >
> >         range.popFront;
> >
> >     }
>
> Ok so this worked when inside a quoted attribute value but not a
> normal tag body. Clearly I'm not parsing valid XML so I'm going
> outside the bounds of valid parameters. But rather than writing a
> custom parser to handle this, it would be nice to have:
>
>       try {
>           range.popFront();
>       } catch (Exception e) {
>           range.moveToNextTag();
>       }
>
> Which would make front a MalformedParse containing the content up
> to the next <.

I don't think that such an approach would work with how dxml does its
validation, because it's designed with the idea that only the range farthest
along does the validation (which was critical in avoiding having to allocate
memory in functions like save). Some validation is currently done by every
range, but it's been my plan too look at making it so that as little
validation as possible is done by the other ranges. Either way, the fact
that any validation is skipped by ranges that are farther behind would cause
a definite problem with trying to then continue parsing passed invalid XML.
As it stands, any range that is farther behind should throw the same
exception when it reaches the one that first hit the invalid XML, whereas if
that range could somehow continue, then the range that's farther behind
would then not do the same validation and would not do the right thing when
it hit the point where moveToNextTag had been called on the first range.

- Jonathan M Davis



More information about the Digitalmars-d-learn mailing list