GSoC 2016 - std.experimental.xml after a month

Thu Jun 23 13:04:26 PDT 2016

-- Brace yourself: a very long post is coming --

Hi,

One month after the official GSoC start, I want to share with you 
what's in std.experimental.xml and what will hopefully be there. 
If you have any question/improvement or anything to say, just 
leave a comment here or an issue on GitHub 
(https://github.com/lodo1995/experimental.xml).

In particular, if you think there are problems with the current 
structure of the project, or major flaws in the APIs, that will 
be very difficult to solve at a later stage, please let me know. 
(Walter and Andrei, I'd really appreciate your feedback here).

Thank you in advance to all who will take time to read this...

What is working?
- Four lexers are provided to abstract different kinds of input 
from the other layers, providing different speed characteristics;
- The parser splits the document into nodes, doing most of the 
hard work;
- A cursor sits on top of the parser, providing an API to advance 
in the document and get information about the current node; it 
supports string interning, which can drastically lower memory 
consumption (given that most nodes share names and attributes);
- A validating cursor is the same as a cursor, but allows the 
user to plug custom validators, that are executed while advancing 
in the input; in the future the library will provide some 
predefined validators to use with it;
- A very simple SAX API built on top of the cursor API is the 
last thing added and tested;
- A partial reimplementation of std.xml is there; when completed 
it will allow a gradual code transition.

What am I working on right now?
I'm trying to implement the DOM level 3 API. The API per se is 
not that difficult, but the infrastructure I'm building around it 
is a hell. In fact, I'm trying to make the DOM nodes reference 
counted and allocated with a custom allocator, to allow their 
usage in @nogc code. This is quite painful (because the DOM has 
lots of circular references, and "normal" reference counting does 
not work with them), but with enough time I will probably manage 
to make it work.

What is planned for the near future?
- When the DOM classes will be usable (even if not 100% complete) 
I will start working on a DOM parser to build them from the 
source;
- DTD check and entity substitution have to be implemented, and 
they will (I hope) fit nicely as pluggable components for the 
validating cursor;
- And of course some APIs to output XML.

What is (incidentally) inside the repository?
- Along with the DOM classes comes a wrapper that allows to 
allocate classes with a custom allocator and reference count them 
(that is, a RefCounted!T that works only for classes);
- A wonderful (or maybe not) benchmark driver that benchmarks the 
various components with various kinds of random generated files 
and prints some wonderful statistics and graphs;
- Needed by the benchmarking code, a simple API to collect 
statistical infos (average, median, deviation) from a range of 
measures;
- Needed by the cursor API, an Interner that can intern not only 
strings, but any array or class.

Thank you again for your time and help.

Lodovico Giaretta