Status of std.xml (D2/Phobos)
Michel Fortin
michel.fortin at michelf.com
Tue Jun 29 05:27:08 PDT 2010
On 2010-06-29 04:41:50 -0400, Alix Pexton <alix.DOT.pexton at gmail.DOT.com> said:
> On 28/06/2010 15:11, Steven Schveighoffer wrote:
>
>> Yes, I don't think the phobos solution needs to mimic exactly the API of
>> SAX or DOM, the author should be free to use D idioms. But starting with
>> a common proven design is probably a good idea.
>>
>> -Steve
>
> I've been thinking about it, and while I believe you when you say that
> SAX can be used to build the DOM, I'm not convinced that SAX is the
> lowest common abstraction.
>
> Michel Fortin's Tokenizer/Range seems much closer to the metal to me.
It is closer to the metal, but there's a catch...
One issue with SAX is that you must allocate an array of strings to
pass the attributes of an element, which is probably going to need a
dynamic allocation at some point. A lower-level abstraction such as
mine (or Tango's pull-parser) just returns each attribute as a separate
token as it parses them.
The downside of the tokenizer interface is that it only checks for a
subset of well-formness, for instance it doesn't check that tags
balance each other correctly or that there is no two attributes with
the same name. It's just a "tokenizer" after all, it can't be described
as a conformant XML parser by itself. The upper layer parser needs to
check for these things. My mini DOM built on this tokenizer does these
checks when using the tokenizer, and it's more efficient to do them
there because that's where the context information is kept, which is
why the tokenizer doesn't do them.
Implementing SAX on top of my tokenizer consists mostly of ensuring
proper tag balancing, checking for duplicate attributes, and collecting
attributes in an array (or another kind of list) you can then give to
the openElement SAX callback.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list