GSoC 2016 - std.xml rewrite
Lodovico Giaretta via Digitalmars-d
digitalmars-d at puremagic.com
Tue Mar 8 10:01:25 PST 2016
On Tuesday, 8 March 2016 at 16:20:00 UTC, CraigDillabaugh wrote:
> Also, if you have concrete ideas feel free to post those on
> here if you want feedback.
I was thinking about the general structure of the parsing
library, and I came up with this schema:
Lexer -> Low Level Parser -> High Level API
There should be various lexers feeding the low level parser,
differing in the kind of input they accept:
- one accepting InputRanges, that can work with almost any data
source and does not require the entire input to be available at
the same time; it's cons are that it must allocate lots of small
strings (one for each token) and grow them one char at a time, so
it's not so fast;
- one accepting Slices, that benefits from fast searches and
slicing, without needing any additional allocation; it's cons is
that the entire input has to be loaded in RAM;
- an hybrid lexer, that tries to get the pros of both and the
cons of none.
There should be various APIs feeded by the low level parser. Here
I took inspiration from Java:
- a DOM API;
- a push parser (like SAX), conceptually similar to the actual
std.xml.ElementParser;
- a pull parser (somehow inspired by StAX), that provides a
cursor to scroll the input and also an InputRange interface, for
easy integration with other D libraries (like std.algorithm).
Validating the input for well-formedness should probably be done
between the parser and the high level API, so that structural
issues (like missing close tags) are found and handled before
affecting, for example, the building of the Document object.
Checking the validity of the document (such as conformance to the
DTD) should instead be done on top of the high level API, which
gives easy access, for example, to namespaces and attributes
(which the low level parser leaves unparsed).
Both kinds of validators should be pluggable and configurable via
template parameters, allowing to select which checks to perform
and how to handle errors (throwing exceptions, calling registered
callbacks or whatever).
After all of this is done, the next step should be an XPath
library (I don't know much about this).
This is just an early sketch, but I'd love to get some feedback.
Thank you for your time.
More information about the Digitalmars-d
mailing list