High performance XML parser

Fri Feb 4 14:03:08 PST 2011

Steven Schveighoffer <schveiguy at yahoo.com> wrote:

> Here is how I would approach it (without doing any research).
>
> First, we need a buffered I/O system where you can easily access and  
> manipulate the buffer.  I have proposed one a few months ago in this NG.
>
> Second, I'd implement the XML lib as a range where "front()" gives you  
> an XMLNode.  If the XMLNode is an element, it will have eager access to  
> the element tag, and lazy access to the attributes and the sub-nodes.   
> Each XMLNode will provide a forward range for the child nodes.
>
> Thus you can "skip" whole elements in the stream by popFront'ing a  
> range, and dive deeper via accessing the nodes of the range.
>
> I'm unsure how well this will work, or if you can accomplish all of it  
> without reallocation (in particular, you may need to store the element  
> information, maybe via a specialized member function?).

Question:

For the lazily computed attributes and subnodes, will accessing one element
cause all elements to be computed? Same goes for getting the number of
elements.

Also, can this be efficiently combined with mmapping? What I sorta imagine
is a kind of lazy slice: It determines whether it ends within this page,  
and
if not, does not progress past that page until asked to do so.

-- 
Simen