On the subject of an XML parser

solidstate1991 laszloszeremi at outlook.com
Thu Sep 1 20:45:54 UTC 2022


On Thursday, 25 August 2022 at 19:41:19 UTC, solidstate1991 wrote:
> I took a look at experimental.xml. According to its tests, it's 
> biggest issue is that it accepts malformed documents. I'll 
> attempt to reverse-engineer the code, then add the necessary 
> checks to reject the malformed documents. Since it has multiple 
> options for allocators (stdx-allocator), it'll be a bit of a 
> challenge, but at worst I can strip that function and replace 
> it with GC only.

So work have begun here: 
https://github.com/ZILtoid1991/experimental.xml

Things I've done so far:
  * Stripped the allocators and the custom error handling 
functions. Not much people are using allocators anyways, it just 
complicates the project, and GC is otherwise the best option for 
anything that builds a complex tree structure. With that gone, I 
can just use exceptions for error handling, which can be toggled 
with a flag: turning it off will enable parsing badly formed XML 
documents, and even SGML in theory.
  * Simplifying a lot of things in general, with array slicing and 
appending.
  * Enabled character escaping, which led me into the DTD hellhole.
  * Enabled checking for bad characters in names and texts.
  * Started working on the processing of XML declarations 
(important for setting version and checking for correct 
encoding), and the DTD.

I know that the removal of the allocators might doom my project 
from the inclusion in the Phobos library, but even then I can 
just release it as a regular dub library. Soon I'll be renaming 
it to newXML or something similar, while keeping the credits to 
its previous authors.


More information about the Digitalmars-d mailing list