A possible GC optimization
bearophile
bearophileHUGS at lycos.com
Thu May 7 15:41:31 PDT 2009
Among the new things done for Python 2.7 (currently in alpha) they have improved the GC:
http://bugs.python.org/issue4074
http://bugs.python.org/issue4688
There they have shown a small Python program named "tuple_gc_hell.py", this is a reduced version:
import time, gc
def list_of_tuples(n):
m = n // 10
l = []
prev_time = time.time()
for i in xrange(n):
if i % m == 0:
cur_time = time.time()
print i, round(cur_time - prev_time, 2)
prev_time = cur_time
l.append((i % 2, i % 12))
#gc.disable()
#import psyco; psyco.full()
list_of_tuples(10 * 1000 * 1000)
Its output:
0 0.0
1000000 1.2
2000000 2.58
3000000 3.98
4000000 4.89
5000000 6.64
6000000 8.05
7000000 9.44
8000000 9.84
9000000 12.2
Total running time: 73.3 s
Disabling the GC and using Psyco JIT compiler:
0 0.0
1000000 0.17
2000000 0.19
3000000 0.17
4000000 0.19
5000000 0.16
6000000 0.17
7000000 0.17
8000000 0.17
9000000 0.19
Total running time: 2.67 s
Despite being a very different language, with a very different GC (Python has an improved reference counting GC) I have tried to test a similar D program (compiled with DMD v2.029):
import std.c.time: clock; // where's CLOCKS_PER_SEC?
import std.stdio: writeln;
//import std.gc: disable;
//import core.memory: disable; // where's the disable?
void array_of_pairs(int n) {
class S {
int x, y;
this(int x, int y) {
this.x = x;
this.x = y;
}
}
int m = n / 10;
S[] l;
auto prev_time = clock();
for (int i; i < n; i++) {
if (i % m == 0) {
auto cur_time = clock();
writeln(i, " ", (cur_time - prev_time) / 1000.0);
prev_time = cur_time;
}
l ~= new S(i % 2, i % 12);
}
}
void main() {
//disable();
array_of_pairs(10_000_000);
}
Its output is not good:
0 0
1000000 1.594
2000000 3.563
3000000 6.156
4000000 8.5
5000000 9.469
6000000 10.047
7000000 13.453
8000000 18.703
9000000 24.203
Total running time: 122.3 s
In DMD2.029 array appends are very slow so in a successive version I have created the array up-front. And I have seen that pulling the class out of the array_of_pairs function speeds up the program significantly. To improve the situation even more I have used a struct instead of a class:
import std.c.time: clock;
import std.stdio: writeln;
struct S {
int x, y;
this(int x, int y) {
this.x = x;
this.x = y;
}
}
void array_of_pairs(int n) {
int m = n / 10;
auto l = new S*[n];
auto prev_time = clock();
for (int i; i < n; i++) {
if (i % m == 0) {
auto cur_time = clock();
writeln(i, " ", (cur_time - prev_time) / 1000.0);
prev_time = cur_time;
}
l[i] = new S(i % 2, i % 12);
}
}
void main() {
//disable();
array_of_pairs(10_000_000);
}
Its output:
0 0
1000000 0.266
2000000 0.625
3000000 0.547
4000000 0.641
5000000 0.765
6000000 0.891
7000000 1.031
8000000 1.172
9000000 1.328
Total running time: 9.46 s
Now it's much faster than Python (with GC and without JIT), but the GC shows having troules anyway.
Just to be sure it's not a D2 problem I have done the following test with DMD v1.042 and Phobos1:
import std.c.time: clock, CLOCKS_PER_SEC;
import std.stdio: writefln;
import std.gc: disable;
struct S { int x, y; }
void list_of_tuples(int n) {
int m = n / 10;
auto l = new S*[n];
auto prev_time = clock();
for (int i; i < n; i++) {
if (i % m == 0) {
auto cur_time = clock();
writefln(i, " ", (cur_time - prev_time) / cast(float)CLOCKS_PER_SEC);
prev_time = cur_time;
}
l[i] = new S;
l[i].x = i % 2;
l[i].y = i % 12;
}
}
void main() {
disable();
list_of_tuples(10_000_000);
}
Its output:
0 0
1000000 0.297
2000000 0.344
3000000 0.359
4000000 0.391
5000000 0.421
6000000 0.438
7000000 0.484
8000000 0.5
9000000 0.532
Total running time: 4.96
About twice faster.
Now that Phobos 2 has Tuple people will use it, and many of such tuples will not contain references, so I think an optimization similar to the one done for Python 2.7 may become useful.
Bye,
bearophile
More information about the Digitalmars-d
mailing list