Fastest JSON parser in the world is a D project

Sat Oct 17 04:30:47 PDT 2015

Am Sat, 17 Oct 2015 08:29:24 +0000
schrieb Ola Fosheim Grøstad
<ola.fosheim.grostad+dlang at gmail.com>:

> On Saturday, 17 October 2015 at 08:20:33 UTC, Daniel N wrote:
> > On Saturday, 17 October 2015 at 08:07:57 UTC, Martin Nowak 
> > wrote:
> >> On Wednesday, 14 October 2015 at 07:35:49 UTC, Marco Leise 
> >> wrote:
> >>>   - Data size limited by available contiguous virtual memory
> >>
> >> Mmaping files for sequential reading is a very debatable 
> >> choice, b/c the common use case is to read a file once. You 
> >> should at least compare the numbers w/ drop_caches between 
> >> each run.

The results are:
* The memory usage is then fixed at slightly more than the
  file size. (While it often stays below when using the disk
  cache.)
* It would still be faster than copying the whole
  thing to a separate memory block.
* Depending on whether the benchmark system uses a HDD or SSD,
  the numbers may be rendered meaningless by a 2 seconds wait
  on I/O.
* Common case yes, but it is possible that you read JSON that
  had just been saved.

> > It's a sensible choice together with appropriate madvise().

Obviously agreed :). Just that in practice (on my HDD system)
it never made a difference in I/O bound sequential reads. So I
removed posix_madvise.

> Mmap is very expensive, as it affects all cores, you need a 
> realistic multithreaded aync benchmark on smaller files to see 
> the effect.

That's valuable information. It is trivial to read into an
allocated block when the file size is below a threshold. I
would just need a rough file size. Are you talking about 4K
pages or mega-bytes? 64 KiB maybe?

-- 
Marco