Replacing std.xml

Michel Fortin michel.fortin at michelf.ca
Thu Aug 29 09:14:28 PDT 2013


On 2013-08-29 07:47:17 +0000, Jonathan M Davis <jmdavisProg at gmx.com> said:

> On Thursday, August 29, 2013 09:25:35 w0rp wrote:
>> The general idea in my mind is
>> "something SAX-like, with something a little DOM-like."
> 
> What I personally think would be best is to have multiple parsers. First you
> have something STAX-like (or maybe even lower level - I don't recall exactly
> what STAX gives you at the moment) that basically tokenizes the XML and
> returns a range of that. Then SAX and DOM parsers can be built on top of that.
> That way, you get the fastest parser possible as well as higher level, more
> functional parsers.
> 
> But two of the biggest points of the design are that it's going to have to be
> range-based, and it's going to need to be able to take full advantage of
> slices (when used with any strings or random-access ranges) in order to avoid
> copying any of the data. That's the key design point which will allow a D
> parser to be extremely fast in comparison to parsers in most other languages.

I wrote something like that a while ago.

It only accepted arrays as input because of the lack of a "buffered 
range" concept that'd allow lookahead and efficient slicing from any 
kind of range, but that could be retrofitted in. It implements pretty 
much all of the XML spec, except for documents having an internal 
subset (which is something a little arcane). It does not deal with 
namespaces either, I feel like that should be done a layer above, but 
I'm not entirely sure.

Lower-level parser:
http://michelf.ca/docs/d/mfr/xmltok.html

Higher-level parser built on the first one:
http://michelf.ca/docs/d/mfr/xml.html

The code:
http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip

That code hasn't been compiled in a while, but it used to work very 
well for me. Feel free to use as a starting point.

-- 
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca



More information about the Digitalmars-d mailing list