Why is Json parsing slower in multiple threads?
Alexandre Bourquelot
alexandre.bourquelot at ahrefs.com
Tue Jun 20 09:31:57 UTC 2023
Hello everyone. We have some D code running in production that
reads files containing lines of JSON data, that we would like to
parse and process.
These files can be processed in parallel, so we create one thread
for processing each file. However I noticed significant slowdowns
when processing multiple files in parallel, as opposed to
processing only one file.
Here is a simple code snippet reproducing the issue. It reads
from a file containing the same json copy pasted 100k times, like
so:
```json
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
...
```
It gives the following output:
```
➜ ./test 1
(file ) (thread id 140310703728384) starting processing file
(file )Done in 1 sec, 549 ms, 257 μs, and 6 hnsecs
➜ ./test 3
(file ) (thread id 140071550318336) starting processing file
(file ) (thread id 140078235236096) starting processing file
(file ) (thread id 140078221063936) starting processing file
(file )Done in 4 secs, 296 ms, 780 μs, and 9 hnsecs
(file )Done in 4 secs, 360 ms, 498 μs, and 3 hnsecs
(file )Done in 4 secs, 393 ms, 342 μs, and 6 hnsecs
```
Another curious thing is that this behaviour is not present when
compiling the code with the `--build=profile` option.
For reference:
```bash
➜ ldc2 --version
LDC - the LLVM D compiler (1.24.0):
based on DMD v2.094.1 and LLVM 11.0.1
```
```d
import std.file;
import core.thread.osthread;
import std.conv;
import std.concurrency;
import std.json;
import std.stdio;
import std.encoding;
import std.datetime.systime : Clock;
import std.process;
import std.functional;
import std.algorithm;
import std.bitmanip;
void parseInThread(string[] lines)
{
writefln("(file %s) (thread id %s) starting processing file",
"", thisThreadID);
auto startTime = Clock.currTime;
foreach (line; lines)
{
line.parseJSON;
}
writefln("(file %s )Done in %s", "", Clock.currTime -
startTime);
}
class T
{
Thread t_;
string _filename;
string[] _lines;
this(string[] lines)
{
_lines = lines.dup;
t_ = new Thread(() { parseInThread(_lines); });
}
void opCall()
{
t_.start;
}
void join()
{
t_.join;
}
}
int main(string[] args)
{
T[] threads;
string filenameBase = "./file";
foreach (k; 1 .. args[1].to!int + 1)
{
auto v = filenameBase ~ k.to!string;
auto newFile = File(v ~ "", "r");
string[] lines;
foreach (line; newFile.byLine)
{
lines ~= (line.to!string);
}
newFile.close;
threads ~= new T(lines);
}
foreach (thread; threads)
{
thread();
}
foreach (thread; threads)
{
thread.join;
}
return 0;
}
```
Thanks in advance, this has been annoying me for a couple of days
and I have no idea what might be the problem. Strangely enough I
also have the same problem when using `vibe-d` json library for
parsing.
More information about the Digitalmars-d
mailing list