std.xml and Adam D Ruppe's dom module

Sean Kelly sean at invisibleduck.org
Thu Feb 9 07:51:26 PST 2012


For XML, template the parser on char type so transcoding is unnecessary. Since JSON is UTF-8 I'd use char there, and at least for the event parser don't proactively decode strings--let the user do this. In fact, don't proactively decode anything. Give me the option of getting a number via its string representation directly from the input buffer. Roughly, JSON events should be:

Enter object
Object key
Int value (as string)
Float value (as string)
Null
True
False
Etc. 

On Feb 8, 2012, at 6:49 PM, "Robert Jacques" <sandford at jhu.edu> wrote:

> On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau <nospam at example.com> wrote:
>> Am Tue, 07 Feb 2012 20:44:08 -0500
>> schrieb "Jonathan M Davis" <jmdavisProg at gmx.com>:
>>> On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
>>> > On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
> [snip]
>> 
>> Using ranges of dchar directly can be horribly inefficient in some
>> cases, you'll need at least some kind off buffered dchar range. Some
>> std.json replacement code tried to use only dchar ranges and had to
>> reassemble strings character by character using Appender. That sucks
>> especially if you're only interested in a small part of the data and
>> don't care about the rest.
>> So for pull/sax parsers: Use buffering, return strings(better:
>> w/d/char[]) as slices to that buffer. If the user needs to keep a
>> string, he can still copy it. (String decoding should also be done
>> on-demand only).
> 
> Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings != UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?


More information about the Digitalmars-d mailing list