Replacing std.xml

Thu Aug 29 00:47:17 PDT 2013

On Thursday, August 29, 2013 09:25:35 w0rp wrote:
> Hello everybody. I've been wondering, what are the current plans
> to replace std.xml? I'd like to help with the effort to get a
> final XML library in phobos. So, I have a few questions.

Someone needs to step forward, write it, and get it through the review 
process. A while back, someone was working on a possible new version of 
std.xml, but they disappeared. No one has stepped up since. I'd love to do it 
if I had time, but I don't. There are probably several others around here in 
the same boat, but until someone who has the time and skill does do it, we 
won't have a new std.xml.

> First, and most importantly, what do we except out of a D XML
> library? I'd really like to have a discussion of the form, "Here
> is exactly the interface the structs/classes need to implement,
> go forth and implement." 

Except that that's really the task of the person creating the new std.xml. 
Generally what happens is that the person writing the module comes up with an 
API and then presents it rather than asking others to come up with ideas to 
design it for them. Obviously, ideas can be discussed, but design-by-committee 
is arguably a bad idea. And it just works better to have a concrete design to 
discuss.

> The general idea in my mind is
> "something SAX-like, with something a little DOM-like."

What I personally think would be best is to have multiple parsers. First you 
have something STAX-like (or maybe even lower level - I don't recall exactly 
what STAX gives you at the moment) that basically tokenizes the XML and 
returns a range of that. Then SAX and DOM parsers can be built on top of that. 
That way, you get the fastest parser possible as well as higher level, more 
functional parsers.

But two of the biggest points of the design are that it's going to have to be 
range-based, and it's going to need to be able to take full advantage of 
slices (when used with any strings or random-access ranges) in order to avoid 
copying any of the data. That's the key design point which will allow a D 
parser to be extremely fast in comparison to parsers in most other languages.

> I'm aware
> that std.xml has some issues support different encodings, so
> obvious that's included.

Personally, I would have just said use ranges of dchar and be done with it 
without worrying about character encodings at all, but I don't remember what 
all the XML standard does with encodings.

> Second, is there an existing library that has gotten close to
> meeting whatever we need for the first point? If so, how far away
> is it from being able to meet all of the requirements and become
> the standard library version?

There are several D XML libraries floating around, but no one has taken the 
time to get any of the prepared for the Phobos review queue, and I suspect 
that very few of them are range-based like the Phobos XML solution needs to 
be, but I don't know.

- Jonathan M Davis