Why is Json parsing slower in multiple threads?

Alexandre Bourquelot alexandre.bourquelot at ahrefs.com
Tue Jun 20 09:31:57 UTC 2023


Hello everyone. We have some D code running in production that 
reads files containing lines of JSON data, that we would like to 
parse and process.

These files can be processed in parallel, so we create one thread 
for processing each file. However I noticed significant slowdowns 
when processing multiple files in parallel, as opposed to 
processing only one file.

Here is a simple code snippet reproducing the issue. It reads 
from a file containing the same json copy pasted 100k times, like 
so:
```json
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
...
```

It gives the following output:
```
➜ ./test 1
(file ) (thread id 140310703728384) starting processing file
(file  )Done in 1 sec, 549 ms, 257 μs, and 6 hnsecs

➜ ./test 3
(file ) (thread id 140071550318336) starting processing file
(file ) (thread id 140078235236096) starting processing file
(file ) (thread id 140078221063936) starting processing file
(file  )Done in 4 secs, 296 ms, 780 μs, and 9 hnsecs
(file  )Done in 4 secs, 360 ms, 498 μs, and 3 hnsecs
(file  )Done in 4 secs, 393 ms, 342 μs, and 6 hnsecs
```
Another curious thing is that this behaviour is not present when 
compiling the code with the `--build=profile` option.

For reference:
```bash
➜ ldc2 --version
LDC - the LLVM D compiler (1.24.0):
   based on DMD v2.094.1 and LLVM 11.0.1
```

```d
import std.file;
import core.thread.osthread;
import std.conv;
import std.concurrency;
import std.json;
import std.stdio;
import std.encoding;
import std.datetime.systime : Clock;
import std.process;
import std.functional;
import std.algorithm;
import std.bitmanip;



void parseInThread(string[] lines)
{
     writefln("(file %s) (thread id %s) starting processing file", 
"", thisThreadID);

     auto startTime = Clock.currTime;

     foreach (line; lines)
     {
         line.parseJSON;
     }

     writefln("(file %s )Done in %s", "", Clock.currTime - 
startTime);
}

class T
{
     Thread t_;
     string _filename;
     string[] _lines;

     this(string[] lines)
     {
         _lines = lines.dup;
         t_ = new Thread(() { parseInThread(_lines); });
     }

     void opCall()
     {
         t_.start;
     }

     void join()
     {
         t_.join;
     }
}

int main(string[] args)
{

     T[] threads;

     string filenameBase = "./file";
     foreach (k; 1 .. args[1].to!int + 1)
     {
         auto v = filenameBase ~ k.to!string;

         auto newFile = File(v ~ "", "r");

         string[] lines;

         foreach (line; newFile.byLine)
         {
             lines ~= (line.to!string);
         }
         newFile.close;

         threads ~= new T(lines);
     }

     foreach (thread; threads)
     {
         thread();
     }

     foreach (thread; threads)
     {
         thread.join;
     }

     return 0;
}
```

Thanks in advance, this has been annoying me for a couple of days 
and I have no idea what might be the problem. Strangely enough I 
also have the same problem when using `vibe-d` json library for 
parsing.


More information about the Digitalmars-d mailing list