std.xml and Adam D Ruppe's dom module

Jacob Carlborg doob at me.com
Tue Feb 7 23:38:55 PST 2012


On 2012-02-08 02:44, Jonathan M Davis wrote:
> On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
>> On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
>>
>> wrote:
>>> Also, two of the major requirements for an improved std.xml are
>>> that it needs to have a range-based API, and it needs to be
>>> fast.
>>
>> What does range based API mean in this context? I do offer
>> a couple ranges over the tree, but it really isn't the main
>> thing there.
>>
>> Check out Element.tree() for the main one.
>>
>>
>> But, if you mean taking a range for input, no, doesn't
>> do that. I've been thinking about rewriting the parse
>> function (if you look at it, you'll probably hate it
>> too!). But, what I have works and is tested on a variety
>> of input, including garbage that was a pain to get working
>> right, so I'm in no rush to change it.
>>
>>> Tango's XML parser has pretty much set the bar on speed
>>
>> Yeah, I'm pretty sure Tango whips me hard on speed. I spent
>> some time in the profiler a month or two ago and got a
>> significant speedup over the datasets I use (html files),
>> but I'm sure there's a whole lot more that could be done.
>>
>>
>>
>> The biggest thing is I don't think you could use my parse
>> function as a stream.
>
> Ideally, std.xml would operate of ranges of dchar (but obviously be optimized
> for strings, since there are lots of optimizations that can be done with
> string processing - at least as far as unicode goes) and it would return a
> range of some kind. The result would probably be a document type of some kind
> which provided a range of its top level nodes (or maybe just the root node)
> which each then provided ranges over their sub-nodes, etc. At least, that's
> the kind of thing that I would expect. Other calls on the document and nodes
> may not be range-based at all (e.g. xpaths should probably be supported, and
> that doesn't necessarily involve ranges). The best way to handle it all would
> probably depend on the implementation. I haven't implemented a full-blown XML
> parser, so I don't know what the best way to go about it would be, but
> ideally, you'd be able to process the nodes as a range.
>
> - Jonathan M Davis

I think there should be a pull or sax parser at the lowest level and 
then a XML document module on top of that parser.

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list