Phobos Proposal: replace std.xml with kxml.

Tue May 4 11:56:31 PDT 2010

Graham Fawcett wrote:
> On Tue, 04 May 2010 09:09:29 -0700, Andrei Alexandrescu wrote:
> 
>> Graham Fawcett wrote:
>>> On Mon, 03 May 2010 16:01:30 -0700, Andrei Alexandrescu wrote:
>>>
>>>> Graham Fawcett wrote:
>>>>> The fact that libxml2/libxslt support not only XML parsing and DOM
>>>>> building, but also XSLT, XPath, XPointer, XInclude, RelaxNG, etc.,
>>>>> means that any homegrown library will be hard-pressed to cover the
>>>>> same range of tools and features.
>>>>>
>>>>> There are too many half-baked XML libraries in the world. No
>>>>> disrespect intended to opticron or anyone else; it just doesn't make
>>>>> a lot of sense to reinvent such a complex wheel (and believing that
>>>>> XML processing isn't complex is a sure sign that your homegrown
>>>>> library's design is incomplete!).
>>>>>
>>>>> Graham
>>>> I think what we need for the standard library is to take a solid XML
>>>> library licensed generously and adapt it to work with arbitrary
>>>> ranges.
>>> By "adapt" do you mean writing a wrapper for an existing library, or
>>> translating the source code of the library into D?
>>>
>>> What constitutes a "generous license" in this context? (For what it's
>>> worth, libxml2 is under the MIT License.)
>>>
>>> Graham
>> We'd need to modify the code. I haven't looked into available xml
>> libraries so I don't know which would be eligible.
> 
> I think I understand your motivations: this is standard library, and
> so you want to minimize dependencies. But from a maintenance
> perspective, it seems a bad idea to translate a complex library into D
> code that few people will actively maintain -- whereas writing a
> wrapper (and introducing a library dependency) would keep the codebase
> small, let you share maintenance costs with the third-party library's
> developers, and (arguably) increase the stability and quality of the
> stdlib?
> 
> I am not pushing for libxml2 as The Answer. I'm just questioning the
> motivation to translate other people's code to D, when the D platform
> excels at library integration. (Although I agree with your suggestion
> to borrow inspiration/code from Boost for datetime and other features;
> that's different, since Boost cannot feasibly be wrapped.)
> 
> Best,
> Graham

My concern is purely technical - a library we just link to would force a 
number of choices, such as input representation (e.g. arrays of char). 
Ideally we should be able to change the library to accept any compatible 
range of any compatible characters.

As a simple example, consider std.algorithm.levenshteinDistance. There 
are plenty of good implementations and initially I just wrote one almost 
identical to the Web lore. However, later I needed to compute 
Levenshtein distances between strings stored in lists (tries, actually). 
Well that doesn't work because the implementation at that time used 
random access s[i] and t[i] all over the place. But it wasn't difficult 
to change the algorithm to work with forward ranges. So now we have one 
of the few Levenshtein distance implementations that work with other 
inputs than arrays. In particular, we work correctly with UTF inputs 
without needing to copy the input, something that I haven't seen 
anywhere else. If you google for ``levenshtein utf'' Google will even 
think the query has a typo. Search results include an OCaml 
implementation that copies the input 
(http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#OCaml) 
and a Ruby implementation that also copies the input 
(http://rubyforge.org/frs/?group_id=2080&release_id=7389). By using the 
range abstraction, we get to support UTF Levenshtein without significant 
additional implementation effort - the code is very similar to the one 
using indices throughout.

Andrei