Replacing std.xml

Brad Anderson eco at gnuk.net
Thu Aug 29 12:34:37 PDT 2013


On Thursday, 29 August 2013 at 18:58:57 UTC, H. S. Teoh wrote:
> No kidding! I was trying to write a program that navigates a 
> website
> automatically using std.net.curl, and I'm running into all 
> sorts of
> silly roadblocks, including std.encoding not supporting 
> iso-8859-*
> encodings.
>

It doesn't look like adding the rest of the ISO-8859 encodings 
would be all that difficult if you used the existing ISO-8859-1 
(Latin1) as a base.  I don't quite understand where and how 
transcoding is done though.

> The good news is that on Linux, there's a handy utility called 
> 'recode',
> which comes with a library called 'librecode', that supports 
> converting
> between a huge number of different encodings -- many more than 
> probably
> you or I have imagined existed -- including to/from Unicode.  I 
> know we
> don't like including external libraries in Phobos, but I 
> honestly don't
> see any justification for reinventing the wheel by writing (and
> maintaining!) our own equivalent to librecode, unless licensing 
> issues
> prevents us from including librecode in Phobos, nicely wrapped 
> in a
> modern range-based D API.
>
>
>> However, because all of the XML special symbols should be 
>> ASCII, you
>> should still be able to avoid decoding characters for the most 
>> part.
>> It's only when you have to actually look at the content that 
>> Unicode
>> would potentially matter. So, the performance hit of decoding 
>> Unicode
>> characters should mostly be able to be avoided.
> [...]
>
> One way is to write the core code of std.xml in such a way that 
> it
> handles all data as ubyte[] (or ushort[]/uint[] for 
> 16-bit/32-bit
> encodings) so that it's encoding-independent. Then on top of 
> this core,
> write some convenience wrappers that casts/converts to string, 
> wstring,
> dstring. As an initial stab, we could support only UTF-8, 
> UTF-16, UTF-32
> if the user asks for string/wstring/dstring, and leave XML in 
> other
> encodings up to the user to decode manually. This way, at least 
> the user
> can get the data out of the file.
>
> Later on, once we've gotten our act together with std.encoding, 
> we can
> hook it up to std.xml to provide autoconversion.
>
>
> T



More information about the Digitalmars-d mailing list