Replacing std.xml

Thu Aug 29 10:40:38 PDT 2013

On Thursday, August 29, 2013 12:14:28 Michel Fortin wrote:
> On 2013-08-29 07:47:17 +0000, Jonathan M Davis <jmdavisProg at gmx.com> said:
> > On Thursday, August 29, 2013 09:25:35 w0rp wrote:
> >> The general idea in my mind is
> >> "something SAX-like, with something a little DOM-like."
> > 
> > What I personally think would be best is to have multiple parsers. First
> > you have something STAX-like (or maybe even lower level - I don't recall
> > exactly what STAX gives you at the moment) that basically tokenizes the
> > XML and returns a range of that. Then SAX and DOM parsers can be built on
> > top of that. That way, you get the fastest parser possible as well as
> > higher level, more functional parsers.
> > 
> > But two of the biggest points of the design are that it's going to have to
> > be range-based, and it's going to need to be able to take full advantage
> > of slices (when used with any strings or random-access ranges) in order
> > to avoid copying any of the data. That's the key design point which will
> > allow a D parser to be extremely fast in comparison to parsers in most
> > other languages.
> I wrote something like that a while ago.
> 
> It only accepted arrays as input because of the lack of a "buffered
> range" concept that'd allow lookahead and efficient slicing from any
> kind of range, but that could be retrofitted in. It implements pretty
> much all of the XML spec, except for documents having an internal
> subset (which is something a little arcane). It does not deal with
> namespaces either, I feel like that should be done a layer above, but
> I'm not entirely sure.
> 
> Lower-level parser:
> http://michelf.ca/docs/d/mfr/xmltok.html
> 
> Higher-level parser built on the first one:
> http://michelf.ca/docs/d/mfr/xml.html
> 
> The code:
> http://michelf.ca/docs/d/mfr-xml-2010-10-19.zip
> 
> That code hasn't been compiled in a while, but it used to work very
> well for me. Feel free to use as a starting point.

Cool. I started looking at implementing something like that a while back but 
really didn't have time to get very far. But if we really care about efficiency, 
I think that that's the basic approach that we need to take. However, the trick
as always is someone having the time to do it. Maybe one of us can take what
you did and start from there or at least use it is an example to start from.

- Jonathan M Davis