fork/exec performance problem?
rikki cattermole via Digitalmars-d
digitalmars-d at puremagic.com
Tue Jun 14 20:51:28 PDT 2016
On 15/06/2016 9:59 AM, Johan Holmberg via Digitalmars-d wrote:
> Hi!
>
> I'm trying to write a simple D program to emulate "parallel -u -jN", ie.
> running a number of commands in parallel to take advantage of a
> multicore machine (I'm testing on a 24-core Ubuntu machine).
>
> I have written almost equivalent programs in C++ and D, and hoped that
> they should run equally fast. But the performance of the D version
> degrades when the number of commands increase, and I don't understand
> why. Maybe I'm using D incorrectly? Or is it the garbage collector that
> kicks in (even if I hope I don't allocate much memory after the initial
> setup)?
>
> My first testcase consisted of a file with 85000 C/C++ compilation
> commands, to be run 24 in parallel. Most source files are really small
> (different modules in the runtime library of a C/C++ compiler for
> embedded development, built in different flavors).
>
> If I invoke the D program 9 times with around 10000 (85000/9 to be
> exact) commands each time, it performs almost on par with the C++
> version. But with all 85000 files in one invokation, the D version takes
> 1.5 times as long (6min 30s --> 10min).
>
> My programs (C++ and D) are really simple:
>
> 1) read all commands from STDIN into an array in the program
> 2) iterate over the array and keep N programs running at all times
> 3) start new programs with "fork/exec"
> 4) wait for finished programs with "waitpid"
>
> If I compare the start of a 85000-run and a 10000-run, the 85000-run is
> slower right from the start. I don't understand why? The only difference
> must be that 85000-run has allocated a bigger array.
>
> My D program can be viewed at:
>
>
> https://bitbucket.org/holmberg556/examples/src/79ef65e389346e9957c535b77201a829af9c62f2/parallel_exec/parallel_exec_dlang.d
>
> Any help would be appreciated.
>
> /Johan Holmberg
This is more appropriate for D.learn.
Few things, disable the GC and force a collect in that while loop.
Next you're allocating hugely.
I would recommend replacing commands variable with some form of
'smarter' array. Basically allocating in blocks instead of just
appending one at a time. I'm not sure Appender is quite what you want
here so it will be home made so to speak.
My revised edition:
import std.conv;
import std.stdio;
import std.string;
import std.process;
import core.stdc.stdlib;
import core.sys.posix.unistd;
import core.sys.posix.sys.wait;
int process_start(string cmdline) {
int pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
}
else if (pid == 0) {
string[3] argv = ["sh", "-c", cmdline];
execvp("sh", argv);
_exit(126);
}
else {
return pid;
}
assert(0);
}
void process_wait(out int pid, out int status) {
pid = waitpid(0, &status, 0);
if (pid == -1) {
perror("waitpid");
exit(1);
}
}
int
main(string[] argv)
{
import core.memory;
GC.disable;
int maxrunning = 1;
if (argv.length > 1) {
maxrunning = to!int(argv[1]);
}
bool verbose = (argv.length > 2);
string command;
string[] commands;
foreach (line; stdin.byLine()) {
commands ~= line.idup;
}
if (verbose) {
writeln("#parallel = ", maxrunning);
writeln("#commands = ", commands.length);
stdout.flush();
}
int next = 0;
int nrunning = 0;
while (next < commands.length || nrunning > 0) {
while (next < commands.length && nrunning < maxrunning) {
process_start(commands[next]);
next++;
nrunning++;
}
int pid;
int exitstatus;
process_wait(pid, exitstatus);
nrunning--;
if (exitstatus != 0) {
writeln("ERROR: ...");
exit(1);
}
GC.collect;
}
return 0;
}
More information about the Digitalmars-d
mailing list