The XML module in Phobos

Michel Fortin michel.fortin at michelf.com
Fri Jul 31 05:48:48 PDT 2009


On 2009-07-30 22:42:29 -0400, Benji Smith <dlanguage at benjismith.net> said:

>> Michael Rynn wrote:
>>> I did look at the code for the xml module, and posted a suggested bug
>>> fix to the empty elements problem. I do not have access rights to
>>> updating the source repository, and at the time was too busy for this.
> 
> Andrei Alexandrescu wrote:
>> It would be great if you could contribute to Phobos. Two things I hope 
>> from any replacement (a) works with ranges and ideally outputs ranges, 
>> (b) uses alias functions instead of delegates if necessary.
> 
> Interesting. Most XML parsers either produce a "Document" object, or 
> they just execute SAX callbacks. If an XML parser returned a range 
> object, how would you use it?
> 
> Usually, I use something like XPath to extract information from an XML 
> doc. Something liek this:
> 
>     auto doc = parser.parse(xml);
>     auto nodes = doc.select("/root//whatever[0][@id]");
> 
> I can see how you might do depth-first or breadth-first traversal of 
> the DOM tree, or inorder traversal of the SAX events, with a range. But 
> that's now how most people use XML. Are there are other range tricks up 
> your sleeve that would support the a DOM or XPath kind of model?

A range is mostly a list of things. In the example above, doc.select 
could return a range to lazily evaluate the query instead of computing 
the whole query and returning all the elements. This way, if you only 
care about the first result you just take the first and don't have to 
compute them all.

Ranges can be used everywehere there are lists, and are especially 
useful for lazy lists that compute things as you go. I made an XML 
tokenizer (similar to Tango's pull parser) with a range API. Basically, 
you iterate over various kinds of token made available through an 
Algebraic, and as you advance it parses the document to get you the 
next token. (It'd be more useful if you could switch on various kinds 
of tokens with an Algebraic -- right now you need to use "if 
(token.peek!OpenElementToken)" -- but that's a problem with Algebraic 
that should get fixed I believe, or else I'll have to use something 
else.)

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Digitalmars-d mailing list