On the subject of an XML parser
solidstate1991
laszloszeremi at outlook.com
Thu Sep 1 20:45:54 UTC 2022
On Thursday, 25 August 2022 at 19:41:19 UTC, solidstate1991 wrote:
> I took a look at experimental.xml. According to its tests, it's
> biggest issue is that it accepts malformed documents. I'll
> attempt to reverse-engineer the code, then add the necessary
> checks to reject the malformed documents. Since it has multiple
> options for allocators (stdx-allocator), it'll be a bit of a
> challenge, but at worst I can strip that function and replace
> it with GC only.
So work have begun here:
https://github.com/ZILtoid1991/experimental.xml
Things I've done so far:
* Stripped the allocators and the custom error handling
functions. Not much people are using allocators anyways, it just
complicates the project, and GC is otherwise the best option for
anything that builds a complex tree structure. With that gone, I
can just use exceptions for error handling, which can be toggled
with a flag: turning it off will enable parsing badly formed XML
documents, and even SGML in theory.
* Simplifying a lot of things in general, with array slicing and
appending.
* Enabled character escaping, which led me into the DTD hellhole.
* Enabled checking for bad characters in names and texts.
* Started working on the processing of XML declarations
(important for setting version and checking for correct
encoding), and the DTD.
I know that the removal of the allocators might doom my project
from the inclusion in the Phobos library, but even then I can
just release it as a regular dub library. Soon I'll be renaming
it to newXML or something similar, while keeping the credits to
its previous authors.
More information about the Digitalmars-d
mailing list