Status of std.xml (D2/Phobos)

Alix Pexton alix.DOT.pexton at gmail.DOT.com
Tue Jun 29 07:36:16 PDT 2010


On 29/06/2010 13:27, Michel Fortin wrote:
> On 2010-06-29 04:41:50 -0400, Alix Pexton
> <alix.DOT.pexton at gmail.DOT.com> said:
>
>> On 28/06/2010 15:11, Steven Schveighoffer wrote:
>>
>>> Yes, I don't think the phobos solution needs to mimic exactly the API of
>>> SAX or DOM, the author should be free to use D idioms. But starting with
>>> a common proven design is probably a good idea.
>>>
>>> -Steve
>>
>> I've been thinking about it, and while I believe you when you say that
>> SAX can be used to build the DOM, I'm not convinced that SAX is the
>> lowest common abstraction.
>>
>> Michel Fortin's Tokenizer/Range seems much closer to the metal to me.
>
> It is closer to the metal, but there's a catch...
>
> One issue with SAX is that you must allocate an array of strings to pass
> the attributes of an element, which is probably going to need a dynamic
> allocation at some point. A lower-level abstraction such as mine (or
> Tango's pull-parser) just returns each attribute as a separate token as
> it parses them.
>
> The downside of the tokenizer interface is that it only checks for a
> subset of well-formness, for instance it doesn't check that tags balance
> each other correctly or that there is no two attributes with the same
> name. It's just a "tokenizer" after all, it can't be described as a
> conformant XML parser by itself. The upper layer parser needs to check
> for these things. My mini DOM built on this tokenizer does these checks
> when using the tokenizer, and it's more efficient to do them there
> because that's where the context information is kept, which is why the
> tokenizer doesn't do them.
>
> Implementing SAX on top of my tokenizer consists mostly of ensuring
> proper tag balancing, checking for duplicate attributes, and collecting
> attributes in an array (or another kind of list) you can then give to
> the openElement SAX callback.
>

My understanding was that SAX _doesn't_ check those things either and 
that it was up to the code responding to the events to tackle 
wellformedness. After all, if SAX handled wellformedness, there would be 
no need for it to pass an argument to closeElement to state what element 
was being closed.
SAX has its place though, when it comes to doing a single pass filter on 
a stream of XML that can be assumed to be wellformed, its simplicity is 
admittedly hard to beat. In other applications, however, there is much 
room for improvement. SAXplus, with a built in element memoisation, an 
element stack and a used id list sounds quite useful to me, as long as 
they remain optional of course.

Admittedly, my initial disappointment when looking into SAX means that 
it is something that I have not followed for some time.

Hmn, I suddenly just got nostalgic for the days when XML was all shiney 
and new and everyone was writing their own APIs or butchering old 
SGML/HTML tech. Makes me want to go look at my old code ^^

A...


More information about the Digitalmars-d mailing list