std.xml2 (collecting features)

Mon May 4 12:14:15 PDT 2015

On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
> std.xml has been considered not up to specs nearly 3 years now. 
> Time to build a successor. I currently plan the following 
> featues for it:
>
> - SAX and DOM parser
> - in-situ / slicing parsing when possible (forward range?)
> - compile time switch (CTS) for lazy attribute parsing
> - CTS for encoding (ubyte(ASCII), char(utf8), ... )
> - CTS for input validating
> - performance
>
> Not much code yet, I'm currently building the performance test 
> suite https://github.com/burner/std.xml2
>
> Please post you feature requests, and please keep the posts DRY 
> and on topic.

If I were doing it, I'd do three types of parsers:

1. A parser that was pretty much as low level as you can get, 
where you basically a range of XML atributes or tags. Exactly how 
to build that could be a bit entertaining, since it would have to 
be hierarchical, and ranges aren't, but something like a range of 
tags where you can get a range of its attributes and sub-tags 
from it so that the whole document can be processed without 
actually getting to the level of even a SAX parser. That parser 
could then be used to build the other parsers, and anyone who 
needed insanely fast speeds could use it rather than the SAX or 
DOM parser so long as they were willing to pay the inevitable 
loss in user-friendliness.

2. SAX parser built on the low level parser.

3. DOM parser built either on the low level parser or the SAX 
parser (whichever made more sense).

I doubt that I'm really explaining the low level parser well 
enough or have even though through it enough, but I really think 
that even a SAX parser is too high level for the base parser and 
that something that slightly higher than a lexer (high enough to 
actually be processing XML rather than individual tokens but 
pretty much only as high as is required to do that) would be a 
far better choice.

IIRC, Michel Fortin's work went in that direction, and he linked 
to his code in another post, so I'd suggest at least looking at 
that for ideas.

Regardless, by building layers of XML parsers rather than just 
the standard ones, it should be possible to get higher 
performance while still having the more standard, user-friendly 
ones for those that don't need the full performance and do need 
the user-friendliness (though of course, we do want the SAX and 
DOM parsers to be efficient as well).

- Jonathan M Davis