dxml 0.2.0 released

Patrick Schluter Patrick.Schluter at bbox.fr
Tue Feb 13 21:18:12 UTC 2018


On Tuesday, 13 February 2018 at 20:10:59 UTC, Jonathan M Davis 
wrote:
> On Tuesday, February 13, 2018 15:22:32 Kagamin via 
> Digitalmars-d-announce wrote:
>> On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis
>>
>> wrote:
>> > The core problem is that entity references get replaced with 
>> > more XML that needs to be parsed. So, they can't simply be 
>> > passed on for post-processing. As I understand it, they have 
>> > to be replaced while the parsing is going on. And that means 
>> > that you can't do something like return slices of the 
>> > original input that don't bother with the entity references 
>> > and then have a separate parser take that and process it 
>> > further to deal with the entity references. The first parser 
>> > has to deal with them, and that means not returning slices 
>> > of the original input unless you're dealing purely with 
>> > strings and are willing to allocate new strings in the cases 
>> > where the data needs to be mutated because of an entity 
>> > reference.
>>
>> Standard entities like & have the same problem, so the 
>> same solution should work too.
>
> That depends on what exactly an entity reference can contain. 
> If it can do something like put a start tag in there, and then 
> it has to be terminated by the document putting an end tag in 
> there or another entity reference containing an end tag, then 
> it can't be handled after the fact like & can be, since 
> & is just replaced by text. If an entity reference can't 
> contain a start tag without a matching end tag, then sure. But 
> I find the XML spec to be surprisingly hard to understand with 
> regards to entity references. It's not clear to me where it's 
> even legal to put them or not, let alone what you're allowed to 
> put in them exactly. And I can't even really trust the XML 
> gramamr as long as entity references are involved, because the 
> gramamr in the spec is the grammar _after_ entity references 
> have all been replaced, which I was quite dismayed to figure 
> out.
>
> If it's 100% sure that entity references can be treated as just 
> text and that you can't end up with stuff like start tags or 
> end tags being inserted and messing with the parsing such that 
> they all have to be replaced for the XML to be correctly 
> parsed, then I have no problem passing entity references along, 
> and a higher level parser could try to do something with them, 
> but it's not clear to me at all that an XML document with 
> entity references is correct enough to be parsed while not 
> replacing the entity references with whatever XML markup they 
> contain. I had originally passed them along with the idea that 
> a higher level parser could do something with them, but I 
> decided that I couldn't do that if you could do something like 
> drop a start tag in there and change the meaning of the stuff 
> that needs to be parsed that isn't directly in the entity 
> reference.
>

There's also the issue that entity references open a whole can of 
worms concerning security. It quite possible to have an 
exponential growing entity replacement that can take down any 
parser.

<!DOCTYPE root [
  <!ELEMENT root ANY>
  <!ENTITY LOL "LOL">
  <!ENTITY LOL1 
"&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;">
  <!ENTITY LOL2 
"&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;">
  <!ENTITY LOL3 
"&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;">
  <!ENTITY LOL4 
"&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;">
  <!ENTITY LOL5 
"&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;">
  <!ENTITY LOL6 
"&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;">
  <!ENTITY LOL7 
"&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;">
  <!ENTITY LOL8 
"&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;">
  <!ENTITY LOL9 
"&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;">
]>
<root>&LOL9;</root>

Hope you have enough memory (this expands to a 3 000 000 000 
LOL's)





More information about the Digitalmars-d-announce mailing list