dxml 0.2.0 released
Patrick Schluter
Patrick.Schluter at bbox.fr
Tue Feb 13 21:18:12 UTC 2018
On Tuesday, 13 February 2018 at 20:10:59 UTC, Jonathan M Davis
wrote:
> On Tuesday, February 13, 2018 15:22:32 Kagamin via
> Digitalmars-d-announce wrote:
>> On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis
>>
>> wrote:
>> > The core problem is that entity references get replaced with
>> > more XML that needs to be parsed. So, they can't simply be
>> > passed on for post-processing. As I understand it, they have
>> > to be replaced while the parsing is going on. And that means
>> > that you can't do something like return slices of the
>> > original input that don't bother with the entity references
>> > and then have a separate parser take that and process it
>> > further to deal with the entity references. The first parser
>> > has to deal with them, and that means not returning slices
>> > of the original input unless you're dealing purely with
>> > strings and are willing to allocate new strings in the cases
>> > where the data needs to be mutated because of an entity
>> > reference.
>>
>> Standard entities like & have the same problem, so the
>> same solution should work too.
>
> That depends on what exactly an entity reference can contain.
> If it can do something like put a start tag in there, and then
> it has to be terminated by the document putting an end tag in
> there or another entity reference containing an end tag, then
> it can't be handled after the fact like & can be, since
> & is just replaced by text. If an entity reference can't
> contain a start tag without a matching end tag, then sure. But
> I find the XML spec to be surprisingly hard to understand with
> regards to entity references. It's not clear to me where it's
> even legal to put them or not, let alone what you're allowed to
> put in them exactly. And I can't even really trust the XML
> gramamr as long as entity references are involved, because the
> gramamr in the spec is the grammar _after_ entity references
> have all been replaced, which I was quite dismayed to figure
> out.
>
> If it's 100% sure that entity references can be treated as just
> text and that you can't end up with stuff like start tags or
> end tags being inserted and messing with the parsing such that
> they all have to be replaced for the XML to be correctly
> parsed, then I have no problem passing entity references along,
> and a higher level parser could try to do something with them,
> but it's not clear to me at all that an XML document with
> entity references is correct enough to be parsed while not
> replacing the entity references with whatever XML markup they
> contain. I had originally passed them along with the idea that
> a higher level parser could do something with them, but I
> decided that I couldn't do that if you could do something like
> drop a start tag in there and change the meaning of the stuff
> that needs to be parsed that isn't directly in the entity
> reference.
>
There's also the issue that entity references open a whole can of
worms concerning security. It quite possible to have an
exponential growing entity replacement that can take down any
parser.
<!DOCTYPE root [
<!ELEMENT root ANY>
<!ENTITY LOL "LOL">
<!ENTITY LOL1
"&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;">
<!ENTITY LOL2
"&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;">
<!ENTITY LOL3
"&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;">
<!ENTITY LOL4
"&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;">
<!ENTITY LOL5
"&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;">
<!ENTITY LOL6
"&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;">
<!ENTITY LOL7
"&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;">
<!ENTITY LOL8
"&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;">
<!ENTITY LOL9
"&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;">
]>
<root>&LOL9;</root>
Hope you have enough memory (this expands to a 3 000 000 000
LOL's)
More information about the Digitalmars-d-announce
mailing list