[due diligence] std.xml

Tue Oct 19 13:43:04 PDT 2010

> Well one obvious problem is you have to read the document into memory
> first, which clearly isn't good enough for large documents.

I think that depends on the type of XML library we create.  A SAX 
library doesn't require the whole document in memory, however a DOM 
library typically does as, from what I can tell, they create an 
in-memory representation that's tree-like.  If you don't read it into 
memory, I'm not really sure how you would be able to, for example, write 
XPath queries to access some random nodes that are not grouped together 
in a relatively efficient manner.  I say relatively because yes, the 
memory layout can be very scattered, however it's still better than 
having to perform random access from disk.

I guess one question we need to ask is what do we expect from this 
library?  Do we want a full DOM implementation or is a SAX parser good 
enough?  Or do we need something in between?  In PHP or Perl, perhaps 
both, I saw a library where an XML document was essentially transformed 
into nested associative arrays.  It made it very easy to read data from 
the XML, however I don't know how much of the official standards it 
complied with.

The current std.xml looks like it tries to be both a DOM library and a 
SAX library.  Personally, I'd rather break them up into two libraries, 
though it may make sense for the DOM library to leverage the SAX library 
to build up it's objects.

IMHO, I love a good SAX parser.  I've used them in the past and I think 
they work great, so having one in D I think would be ideal, especially 
in those situations where the XML file is essentially read-only.

Do we need a DOM parser?  I honestly don't know.  Personally, I'd be 
happy with the associative array approach as it's simple.  I don't need 
to learn a new API just to navigate through XML.  Yes, I know there are 
advantages to using the DOM and XPath, which I also like, but for the 
most part, I don't need either.

Of course, I personally would love to just let XML die and use better 
data formats, but that's an unrealistic dream :)

Casey