GSoC 2016 - std.experimental.xml after a month
Lodovico Giaretta via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 23 13:04:26 PDT 2016
-- Brace yourself: a very long post is coming --
Hi,
One month after the official GSoC start, I want to share with you
what's in std.experimental.xml and what will hopefully be there.
If you have any question/improvement or anything to say, just
leave a comment here or an issue on GitHub
(https://github.com/lodo1995/experimental.xml).
In particular, if you think there are problems with the current
structure of the project, or major flaws in the APIs, that will
be very difficult to solve at a later stage, please let me know.
(Walter and Andrei, I'd really appreciate your feedback here).
Thank you in advance to all who will take time to read this...
What is working?
- Four lexers are provided to abstract different kinds of input
from the other layers, providing different speed characteristics;
- The parser splits the document into nodes, doing most of the
hard work;
- A cursor sits on top of the parser, providing an API to advance
in the document and get information about the current node; it
supports string interning, which can drastically lower memory
consumption (given that most nodes share names and attributes);
- A validating cursor is the same as a cursor, but allows the
user to plug custom validators, that are executed while advancing
in the input; in the future the library will provide some
predefined validators to use with it;
- A very simple SAX API built on top of the cursor API is the
last thing added and tested;
- A partial reimplementation of std.xml is there; when completed
it will allow a gradual code transition.
What am I working on right now?
I'm trying to implement the DOM level 3 API. The API per se is
not that difficult, but the infrastructure I'm building around it
is a hell. In fact, I'm trying to make the DOM nodes reference
counted and allocated with a custom allocator, to allow their
usage in @nogc code. This is quite painful (because the DOM has
lots of circular references, and "normal" reference counting does
not work with them), but with enough time I will probably manage
to make it work.
What is planned for the near future?
- When the DOM classes will be usable (even if not 100% complete)
I will start working on a DOM parser to build them from the
source;
- DTD check and entity substitution have to be implemented, and
they will (I hope) fit nicely as pluggable components for the
validating cursor;
- And of course some APIs to output XML.
What is (incidentally) inside the repository?
- Along with the DOM classes comes a wrapper that allows to
allocate classes with a custom allocator and reference count them
(that is, a RefCounted!T that works only for classes);
- A wonderful (or maybe not) benchmark driver that benchmarks the
various components with various kinds of random generated files
and prints some wonderful statistics and graphs;
- Needed by the benchmarking code, a simple API to collect
statistical infos (average, median, deviation) from a range of
measures;
- Needed by the cursor API, an Interner that can intern not only
strings, but any array or class.
Thank you again for your time and help.
Lodovico Giaretta
More information about the Digitalmars-d
mailing list