Why is Json parsing slower in multiple threads?
FeepingCreature
feepingcreature at gmail.com
Tue Jun 20 11:57:59 UTC 2023
On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot
wrote:
> Hello everyone. We have some D code running in production that
> reads files containing lines of JSON data, that we would like
> to parse and process.
>
> These files can be processed in parallel, so we create one
> thread for processing each file. However I noticed significant
> slowdowns when processing multiple files in parallel, as
> opposed to processing only one file.
>
> ...
>
> Thanks in advance, this has been annoying me for a couple of
> days and I have no idea what might be the problem. Strangely
> enough I also have the same problem when using `vibe-d` json
> library for parsing.
Yeah if you look with `perf record`, you will see that the
program spends approximately all its runtime in the garbage
collector. JSON parsing is very memory hungry. So you get no
parallelization because the allocator takes a lock, and you also
get the overhead of lots and lots of lock waits.
I recommend using a streaming JSON parser like std_data_json
https://github.com/dlang-community/std_data_json and loading into
a well-typed data structure directly, to keep the amount of
unnecessary allocations to a minimum.
More information about the Digitalmars-d
mailing list