dxml 0.2.0 released

Jonathan M Davis newsgroup.d at jmdavisprog.com
Tue Feb 13 22:00:59 UTC 2018


On Tuesday, February 13, 2018 21:18:12 Patrick Schluter via Digitalmars-d-
announce wrote:
> There's also the issue that entity references open a whole can of
> worms concerning security. It quite possible to have an
> exponential growing entity replacement that can take down any
> parser.

Well, if dxml just passes the entity references along unparsed beyond
validating that the entity reference itself contains valid characters (e.g.
it's not something like &.; or & by itself), then dxml would still not be
replacing the entity references with anything. Any security or performance
problems associated with entity references would be left up to whatever
parser parsed the DTD section and then used dxml to parse the rest of the
XML and replaced the entity references in dxml's parsing results with
whatever they were.

The big problem is how the entity references affect the parsing. If start
tags can be dropped in and affect the parsing (and it's still not clear to
me from the spec whether that's legal - there is a section talking about
being nested properly which might indicate that that's not legal, but it's
not very specific or clear), and if it's legal to do something like use an
entity reference for a tag name - e.g. <&foo;>, then that's a serious
problem. And problems like that are the main reason why I completely dropped
any attempt to do anything with the DTD section.

If entity references are only legal in the text between start and end tags
and between the quotes of attribute values, and whatever they're replaced
with cannot actually affect anything else in the XML document (i.e. it can't
just be a start or end tag or anything like that - it has to be fulling
parseable on its own and not affect the parsing of the document itself),
then passing them along should be fine.

Basically, if I can change dxml so that in the places where it currently
allows one of the standard entity references to be, it then also allows
other entity references but passes them along without replacing them instead
of throwing an XMLParsingException, and that works without having documents
be screwed up due to missing start tags or something, then passing them
along should be fine. But if entity references allow arbitrary enough chunks
of XML, that doesn't work. It also doesn't work if entity references are
allowed in places other than the text between start and end tags or within
attribute values. And it's not clear to me at all what is legal in an entity
reference or where exactly they're legal. The spec talks about the grammar
being the grammar _after_ all of the references have been replaced, which
makes the grammar rather untrustworthy, and I find the spec very hard to
understand in general.

Regardless, there's no risk of dxml's parser ever being changed to actually
replace entity references. That doesn't work with returning slices of the
original input, and it really doesn't work with a parser that's just
supposed to take a range of characters and parse it. To fully handle all of
the DTD stuff means actually reading files from disk or from the internet -
which of course is where the security problems come in, but it also means
that you're not just dealing with a parser anymore. In principle, dxml's
parser should be pure (though some implementation make it so that it isn't
right now), whereas an XML parser that fully handles the DTD section could
never be pure.

- Jonathan M Davis



More information about the Digitalmars-d-announce mailing list