std.xml and Adam D Ruppe's dom module

Tue Feb 7 17:44:08 PST 2012

On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
> On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
> 
> wrote:
> > Also, two of the major requirements for an improved std.xml are
> > that it needs to have a range-based API, and it needs to be
> > fast.
> 
> What does range based API mean in this context? I do offer
> a couple ranges over the tree, but it really isn't the main
> thing there.
> 
> Check out Element.tree() for the main one.
> 
> 
> But, if you mean taking a range for input, no, doesn't
> do that. I've been thinking about rewriting the parse
> function (if you look at it, you'll probably hate it
> too!). But, what I have works and is tested on a variety
> of input, including garbage that was a pain to get working
> right, so I'm in no rush to change it.
> 
> > Tango's XML parser has pretty much set the bar on speed
> 
> Yeah, I'm pretty sure Tango whips me hard on speed. I spent
> some time in the profiler a month or two ago and got a
> significant speedup over the datasets I use (html files),
> but I'm sure there's a whole lot more that could be done.
> 
> 
> 
> The biggest thing is I don't think you could use my parse
> function as a stream.

Ideally, std.xml would operate of ranges of dchar (but obviously be optimized 
for strings, since there are lots of optimizations that can be done with 
string processing - at least as far as unicode goes) and it would return a 
range of some kind. The result would probably be a document type of some kind 
which provided a range of its top level nodes (or maybe just the root node) 
which each then provided ranges over their sub-nodes, etc. At least, that's 
the kind of thing that I would expect. Other calls on the document and nodes 
may not be range-based at all (e.g. xpaths should probably be supported, and 
that doesn't necessarily involve ranges). The best way to handle it all would 
probably depend on the implementation. I haven't implemented a full-blown XML 
parser, so I don't know what the best way to go about it would be, but 
ideally, you'd be able to process the nodes as a range.

- Jonathan M Davis