Programming language benchmarks
bearophile
bearophileHUGS at lycos.com
Thu Apr 28 04:50:19 PDT 2011
Moritz Warning:
> I think this might be interesting; it's also using D.
I was about to post this :-)
The little benchmark:
http://attractivechaos.wordpress.com/2011/04/25/my-programming-language-benchmarks-plb/
http://attractivechaos.github.com/plb/
(The matrix mul code is not a realistic example of code because people use library code to do this operation.)
Original C code:
double **mm_mul(int n, double *const *a, double *const *b)
{
int i, j, k;
double **m, **c;
m = mm_init(n); c = mm_init(n);
for (i = 0; i < n; ++i) // transpose
for (j = 0; j < n; ++j)
c[i][j] = b[j][i];
for (i = 0; i < n; ++i) {
double *p = a[i], *q = m[i];
for (j = 0; j < n; ++j) {
double t = 0.0, *r = c[j];
for (k = 0; k < n; ++k)
t += p[k] * r[k];
q[j] = t;
}
}
mm_destroy(n, c);
return m;
}
The original D1 code of the matrix mul, coming from the Java version:
double[][] matmul(double[][] a, double[][] b) {
int m = a.length, n = a[0].length, p = b[0].length;
double[][] x, c;
x.length = m; c.length = p;
for (int i = 0; i < m; ++i) x[i].length = p;
for (int i = 0; i < p; ++i) c[i].length = n;
for (int i = 0; i < n; ++i) // transpose
for (int j = 0; j < p; ++j)
c[j][i] = b[i][j];
for (int i = 0; i < m; ++i)
for (int j = 0; j < p; ++j) {
double s = 0.0;
for (int k = 0; k < n; ++k)
s += a[i][k] * c[j][k];
x[i][j] = s;
}
return x;
}
Improved D2 code:
http://codepad.org/aifQYhjI
pure nothrow double[][] matMul(in double[][] a, in double[][] b) {
immutable int m = a.length,
n = a[0].length,
p = b[0].length;
// transpose
auto c = new double[][](p, n);
foreach (i, brow; b)
foreach (j, bx; brow)
c[j][i] = bx;
auto x = new double[][](m, p);
foreach (i, arow; a)
foreach (j, crow; c) {
// x[i][j] = std.numeric.dotProduct(arow, crow); // right way
double s = 0.0;
foreach (k, arowk; arow)
s += arowk * crow[k];
x[i][j] = s;
}
return x;
}
Lua code:
function matrix.mul(a, b, n)
local y = matrix.new(n, n)
local c = matrix.T(b, n)
for i = 1, n - 1 do
for j = 1, n - 1 do
local sum = 0
for k = 0, n - 1 do sum = sum + a[i][k] * c[j][k] end
y[i][j] = sum
end
end
return y
end
--------------------
Timings, seconds:
ICC-12.0.3 C 1.8
GCC-4.3.2 C 2.3
GDC-0.24 D 8.8 (DMD gives similar timings)
Java at JRE-1.6.0_14 Java 2.9
LuaJIT-2.0.0-beta6 (JIT) Lua 2.7
Improved D version: about 4.4 seconds
Inner loop GCC:
L25:
fldl (%edx,%eax,8)
fmull (%ecx,%eax,8)
addl $1, %eax
cmpl %ebx, %eax
faddp %st, %st(1)
jne L25
Inner loop DMD, original version:
L153: mov EDX,4[EBX]
mov EAX,[EBX]
mov EAX,024h[ESP]
fld qword ptr [ESI*8][EDX]
mov EDX,028h[ESP]
mov EAX,[EDI*8][EDX]
mov EDX,4[EDI*8][EDX]
fmul qword ptr [ESI*8][EDX]
inc ESI
cmp ESI,EBP
fadd qword ptr 034h[ESP]
fstp qword ptr 034h[ESP]
jl L153
Inner loop DMD, improved:
L172: fld qword ptr [EBX*8][ECX]
fmul qword ptr [EBX*8][ESI]
inc EBX
cmp EBX,04Ch[ESP]
fadd qword ptr 054h[ESP]
fstp qword ptr 054h[ESP]
jb L172
The asm produced by DMD perfoms too many memory accesses in the inner loop. Is it possible to improve the DMD backend to remove this problem?
Bye,
bearophile
More information about the Digitalmars-d
mailing list