std.xml2 (collecting features)
Jonathan M Davis via Digitalmars-d
digitalmars-d at puremagic.com
Mon May 4 12:14:15 PDT 2015
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:
> std.xml has been considered not up to specs nearly 3 years now.
> Time to build a successor. I currently plan the following
> featues for it:
>
> - SAX and DOM parser
> - in-situ / slicing parsing when possible (forward range?)
> - compile time switch (CTS) for lazy attribute parsing
> - CTS for encoding (ubyte(ASCII), char(utf8), ... )
> - CTS for input validating
> - performance
>
> Not much code yet, I'm currently building the performance test
> suite https://github.com/burner/std.xml2
>
> Please post you feature requests, and please keep the posts DRY
> and on topic.
If I were doing it, I'd do three types of parsers:
1. A parser that was pretty much as low level as you can get,
where you basically a range of XML atributes or tags. Exactly how
to build that could be a bit entertaining, since it would have to
be hierarchical, and ranges aren't, but something like a range of
tags where you can get a range of its attributes and sub-tags
from it so that the whole document can be processed without
actually getting to the level of even a SAX parser. That parser
could then be used to build the other parsers, and anyone who
needed insanely fast speeds could use it rather than the SAX or
DOM parser so long as they were willing to pay the inevitable
loss in user-friendliness.
2. SAX parser built on the low level parser.
3. DOM parser built either on the low level parser or the SAX
parser (whichever made more sense).
I doubt that I'm really explaining the low level parser well
enough or have even though through it enough, but I really think
that even a SAX parser is too high level for the base parser and
that something that slightly higher than a lexer (high enough to
actually be processing XML rather than individual tokens but
pretty much only as high as is required to do that) would be a
far better choice.
IIRC, Michel Fortin's work went in that direction, and he linked
to his code in another post, so I'd suggest at least looking at
that for ideas.
Regardless, by building layers of XML parsers rather than just
the standard ones, it should be possible to get higher
performance while still having the more standard, user-friendly
ones for those that don't need the full performance and do need
the user-friendliness (though of course, we do want the SAX and
DOM parsers to be efficient as well).
- Jonathan M Davis
More information about the Digitalmars-d
mailing list