Save JSONValue binary in file?

Sean Kelly sean at invisibleduck.org
Fri Oct 12 16:26:19 PDT 2012


On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue at gmail.com> wrote:
> 
> I got this 109 MB json file that I read... and it takes over 32
> seconds for parseJSON() to finish it. So I was wondering if it
> was a way to save it as binary or something like that so I can
> read it super fast?

The performance problem is because std.json works like a DOM parser for XML--it allocates a node per value in the JSON stream.  What we really need is something that works more like a SAX parser with the DOM version as an optional layer built on top.  Just for kicks, I grabbed the fourth (largest) JSON blob from here:

http://www.json.org/example.html

then wrapped it in array tags and duplicated the object until I had a ~350 MB input file.  ie.

[ paste, paste, paste, … ]

Then I parsed it via this test app, based on an example in a SAX-style JSON parser I wrote in C:


import core.stdc.stdlib;
import core.sys.posix.unistd;
import core.sys.posix.sys.stat;
import core.sys.posix.fcntl;
import std.json;

void main()
{
    auto filename = "input.txt\0".dup;

    stat_t st;
    stat(filename.ptr, &st);
    auto sz = st.st_size;
    auto buf = cast(char*) malloc(sz);
    auto fh = open(filename.ptr, O_RDONLY);
    read(fh, buf, sz);

    auto json = parseJSON(buf[0 .. sz]);
}


Here are my results:


$ dmd -release -inline -O dtest
$ ll input.txt
-rw-r--r--  1 sean  staff  365105313 Oct 12 15:50 input.txt
$ time dtest

real  1m36.462s
user 1m32.468s
sys   0m1.102s
 

Then I ran my SAX style parser example on the same input file:


$ make example
cc example.c -o example lib/release/myparser.a
$ time example

real  0m2.191s
user 0m1.944s
sys   0m0.241s


So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream.  Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file.  In short, DOM style parsers are great for small data and terrible for large data.




More information about the Digitalmars-d-learn mailing list