Replacing std.xml

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Aug 29 14:27:22 PDT 2013


On Thu, Aug 29, 2013 at 12:41:16PM -0700, Sean Kelly wrote:
> On Aug 29, 2013, at 11:57 AM, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:
> > 
> > One way is to write the core code of std.xml in such a way that it
> > handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit
> > encodings) so that it's encoding-independent. Then on top of this
> > core, write some convenience wrappers that casts/converts to string,
> > wstring, dstring. As an initial stab, we could support only UTF-8,
> > UTF-16, UTF-32 if the user asks for string/wstring/dstring, and
> > leave XML in other encodings up to the user to decode manually. This
> > way, at least the user can get the data out of the file.
> > 
> > Later on, once we've gotten our act together with std.encoding, we
> > can hook it up to std.xml to provide autoconversion.
> 
> As long autoconversion is optional.  When parsing XML or JSON or
> whatever, I generally only care about specific strings, and sometimes
> don't want anything decoded at all.  Having decoding done
> automatically before the event fires is a huge and potentially
> unnecessary performance hit.  Not doing this decoding automatically is
> what makes the Tango XML parser so fast.

Right, that's why I said the core of std.xml should handle everything as
bytes, only specially treating the ASCII values of <, >, &, and other
metacharacters. The tagname and tag body should just be a range over
segments of the input.


T

-- 
What are you when you run out of Monet? Baroque.


More information about the Digitalmars-d mailing list