std.jgrandson
Sean Kelly via Digitalmars-d
digitalmars-d at puremagic.com
Sun Aug 3 13:40:46 PDT 2014
On Sunday, 3 August 2014 at 17:40:48 UTC, Andrei Alexandrescu
wrote:
> On 8/3/14, 10:19 AM, Sean Kelly wrote:
>> I don't want to pay for anything I don't use. No allocations
>> should
>> occur within the parser and it should simply slice up the
>> input.
>
> What to do about arrays and objects, which would naturally
> allocate arrays and associative arrays respectively? What about
> strings with backslash-encoded characters?
This is tricky with a range. With an event-based parser I'd have
events for object and array begin / end, but with a range you end
up having an element that's a token, which is pretty weird. For
encoded characters (and you need to make sure you handle
surrogate pairs in your decoder) I'd still provide some means of
decoding on demand. If nothing else, decode lazily when the user
asks for the string value. That way the user isn't paying to
decode strings he isn't interested in.
> No allocation works for tokenization, but parsing is a whole
> different matter.
>
>> So the
>> lowest layer should allow me to iterate across symbols in some
>> way.
>
> Yah, that would be the tokenizer.
But that will halt on comma and colon and such, correct? That's
a tad lower than I'd want, though I guess it would be easy enough
to build a parser on top of it.
>> When I've done this in the past it was SAX-style (ie. a
>> callback per
>> type) but with the range interface that shouldn't be necessary.
>>
>> The parser shouldn't decode or convert anything unless I ask
>> it to.
>> Most of the time I only care about specific values, and paying
>> for
>> conversions on everything is wasted process time.
>
> That's tricky. Once you scan for 2 specific characters you may
> as well scan for a couple more, the added cost is negligible.
> In contrast, scanning once for finding termination and then
> again for decoding purposes will definitely be a lot more
> expensive.
I think I'm getting a bit confused. For the JSON parser I wrote,
the parser performs full validation but leaves the content as-is,
then provides a routine to decode values from their string
representation if the user wishes to. I'm not sure where scanning
figures in here.
> Andrei
More information about the Digitalmars-d
mailing list