Phobos Proposal: replace std.xml with kxml.

Tue May 4 14:56:33 PDT 2010

On 2010-05-04 12:09:29 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> Graham Fawcett wrote:
>> By "adapt" do you mean writing a wrapper for an existing library, or 
>> translating the source code of the library into D?
>> What constitutes a "generous license" in this context? (For what it's 
>> worth, libxml2 is under the MIT License.)
>> 
>> Graham
> 
> We'd need to modify the code. I haven't looked into available xml 
> libraries so I don't know which would be eligible.

I think if you wanted to port an XML library to make use of ranges, the 
only viable option is probably to find one based on C++ iterators. 
Otherwise it'll look more like a rewrite than a port, and at this point 
why not write one from scratch?

Anyway, just in case, would you be interested in an XML tokenizer and 
simple DOM following this model?

	http://michelf.com/docs/d/mfr/xmltok.html
	http://michelf.com/docs/d/mfr/xml.html

At the base is a pull parser and an event parser mixed in the same 
function template: "tokenize", allowing you to alternate between 
even-based and pull-parsing at will. I'm using it, but its development 
is on hold at this time, I'm just maintaining it so it compiles on the 
newest versions of DMD.

The only thing it doesn't parse at this time is inline DTDs inside the doctype.

Also, it currently only works only with strings, for simplicity and 
performance. There is one issue about non-string parsing: when parsing 
a string, it's easy to just slice the string and move it around, but if 
you're parsing from a generic input range, you basically have to copy 
characters one by one, which is much less efficient. So ideally the 
algorithm should use slices whenever it can (when the input is a 
string).

I'm not sure yet how to attack this problem, but I'm thinking that 
perhaps parsing primitives should be "part of" the range interface. I 
say this in the sense that a range should provide specialized 
implementation of primitive when it can implement them more efficiently 
(like by slicing). You wrote a while ago about designing parsing 
primitives, is this part of Phobos now?

Anyway, the problem above is probably the one reason we might want to 
write the parser from scratch: it needs to bind to specializable 
higher-level parsing functions to take advantage of the performance 
characteristics of certain ranges, such as those you can slice.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/