RFC: std.json sucessor

Brad Roberts via Digitalmars-d digitalmars-d at puremagic.com
Sat Aug 23 12:00:37 PDT 2014


On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:
> On 8/23/2014 10:42 AM, Sönke Ludwig wrote:
>> Am 23.08.2014 19:38, schrieb Walter Bright:
>>> On 8/23/2014 9:36 AM, Sönke Ludwig wrote:
>>>> input types "string" and "immutable(ubyte)[]"
>>>
>>> Why the immutable(ubyte)[] ?
>>
>> I've adopted that basically from Andrei's module. The idea is to allow
>> processing data with arbitrary character encoding. However, the output
>> will
>> always be Unicode and JSON is defined to be encoded as Unicode, too,
>> so that
>> could probably be dropped...
>
> I feel that non-UTF encodings should be handled by adapter algorithms,
> not embedded into the JSON lexer, so yes, I'd drop that.

For performance purposes, determining encoding during lexing is useful. 
  You can avoid any conversion costs when you know that the original 
string is ascii or utf-8 or other.  The cost during lexing is 
essentially zero.  The cost of storing that state might be a concern, or 
it might be free in otherwise unused padding space.  The cost of 
re-scanning strings that can be avoided is non-trivial.

My past experience with this was in an http parser, where there's even 
more complex logic than json parsing, but the concepts still apply.


More information about the Digitalmars-d mailing list