Why is Json parsing slower in multiple threads?

FeepingCreature feepingcreature at gmail.com
Tue Jun 20 11:57:59 UTC 2023


On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot 
wrote:
> Hello everyone. We have some D code running in production that 
> reads files containing lines of JSON data, that we would like 
> to parse and process.
>
> These files can be processed in parallel, so we create one 
> thread for processing each file. However I noticed significant 
> slowdowns when processing multiple files in parallel, as 
> opposed to processing only one file.
>
> ...
> 
> Thanks in advance, this has been annoying me for a couple of 
> days and I have no idea what might be the problem. Strangely 
> enough I also have the same problem when using `vibe-d` json 
> library for parsing.

Yeah if you look with `perf record`, you will see that the 
program spends approximately all its runtime in the garbage 
collector. JSON parsing is very memory hungry. So you get no 
parallelization because the allocator takes a lock, and you also 
get the overhead of lots and lots of lock waits.

I recommend using a streaming JSON parser like std_data_json 
https://github.com/dlang-community/std_data_json and loading into 
a well-typed data structure directly, to keep the amount of 
unnecessary allocations to a minimum.


More information about the Digitalmars-d mailing list