std.xml2 (collecting features)

Jacob Carlborg via Digitalmars-d digitalmars-d at puremagic.com
Mon May 4 12:26:08 PDT 2015


On 2015-05-04 21:14, Jonathan M Davis wrote:

> If I were doing it, I'd do three types of parsers:
>
> 1. A parser that was pretty much as low level as you can get, where you
> basically a range of XML atributes or tags. Exactly how to build that
> could be a bit entertaining, since it would have to be hierarchical, and
> ranges aren't, but something like a range of tags where you can get a
> range of its attributes and sub-tags from it so that the whole document
> can be processed without actually getting to the level of even a SAX
> parser. That parser could then be used to build the other parsers, and
> anyone who needed insanely fast speeds could use it rather than the SAX
> or DOM parser so long as they were willing to pay the inevitable loss in
> user-friendliness.
>
> 2. SAX parser built on the low level parser.
>
> 3. DOM parser built either on the low level parser or the SAX parser
> (whichever made more sense).
>
> I doubt that I'm really explaining the low level parser well enough or
> have even though through it enough, but I really think that even a SAX
> parser is too high level for the base parser and that something that
> slightly higher than a lexer (high enough to actually be processing XML
> rather than individual tokens but pretty much only as high as is
> required to do that) would be a far better choice.
>
> IIRC, Michel Fortin's work went in that direction, and he linked to his
> code in another post, so I'd suggest at least looking at that for ideas.

This way the XML parser is structured in Tango. A pull parser at the 
lowest level, a SAX parser on top of that and I think the DOM parser 
builds on top of the pull parser.

The Tango pull parser can give you the following tokens:

* start element
* attribute
* end element
* end empty element
* data
* comment
* cdata
* doctype
* pi

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list