Save JSONValue binary in file?
Sean Kelly
sean at invisibleduck.org
Fri Oct 12 16:26:19 PDT 2012
On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue at gmail.com> wrote:
>
> I got this 109 MB json file that I read... and it takes over 32
> seconds for parseJSON() to finish it. So I was wondering if it
> was a way to save it as binary or something like that so I can
> read it super fast?
The performance problem is because std.json works like a DOM parser for XML--it allocates a node per value in the JSON stream. What we really need is something that works more like a SAX parser with the DOM version as an optional layer built on top. Just for kicks, I grabbed the fourth (largest) JSON blob from here:
http://www.json.org/example.html
then wrapped it in array tags and duplicated the object until I had a ~350 MB input file. ie.
[ paste, paste, paste, … ]
Then I parsed it via this test app, based on an example in a SAX-style JSON parser I wrote in C:
import core.stdc.stdlib;
import core.sys.posix.unistd;
import core.sys.posix.sys.stat;
import core.sys.posix.fcntl;
import std.json;
void main()
{
auto filename = "input.txt\0".dup;
stat_t st;
stat(filename.ptr, &st);
auto sz = st.st_size;
auto buf = cast(char*) malloc(sz);
auto fh = open(filename.ptr, O_RDONLY);
read(fh, buf, sz);
auto json = parseJSON(buf[0 .. sz]);
}
Here are my results:
$ dmd -release -inline -O dtest
$ ll input.txt
-rw-r--r-- 1 sean staff 365105313 Oct 12 15:50 input.txt
$ time dtest
real 1m36.462s
user 1m32.468s
sys 0m1.102s
Then I ran my SAX style parser example on the same input file:
$ make example
cc example.c -o example lib/release/myparser.a
$ time example
real 0m2.191s
user 0m1.944s
sys 0m0.241s
So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream. Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file. In short, DOM style parsers are great for small data and terrible for large data.
More information about the Digitalmars-d-learn
mailing list