DMD 1.034 and 2.018 releases

Sat Aug 9 07:46:19 PDT 2008

== Quote from bearophile (bearophileHUGS at lycos.com)'s article
> First benchmark, just D against itself, not used GCC yet, the results show that
vector ops are generally slower, but maybe there's some bug/problem in my
benchmark (note it needs just Phobos!), not tested on Linux yet:

I see at least part of the problem.  When you use such huge arrays, it ends up
being more a test of your memory bandwidth than of the vector ops.  Three arrays
of 80000 ints comes to a total of about 960k.  This is not going to fit in any L1
cache for a long time.  Heck, my CPU only has 512k L2 cache per core.  Here are my
results using smaller arrays designed to fit in my 64k L1 data cache, and the same
code as Bearophile.

+ operator:
D:\code>array_benchmark.exe 500 1000000 0
array len= 4000  nloops= 1000000  Use vec ops: false
time= 4.82841 s

D:\code>array_benchmark.exe 500 1000000 1
array len= 4000  nloops= 1000000  Use vec ops: true
time= 2.32902 s

* operator :

D:\code>array_benchmark.exe 500 1000000 0
array len= 4000  nloops= 1000000  Use vec ops: false
time= 6.1556 s

D:\code>array_benchmark.exe 500 1000000 1
array len= 4000  nloops= 1000000  Use vec ops: true
time= 6.16539 s

/ operator:

D:\code>array_benchmark.exe 500 100000 0
array len= 4000  nloops= 100000  Use vec ops: false
time= 7.02435 s

D:\code>array_benchmark.exe 500 100000 1
array len= 4000  nloops= 100000  Use vec ops: true
time= 6.84251 s

BTW, for the sake of comparison, here's my CPU specs from CPU-Z. Also note that
I'm running in 32-bit mode.

Number of processors	1
Number of cores	2 per processor
Number of threads	2 (max 2) per processor
Name	AMD Athlon 64 X2 3600+
Code Name	Brisbane
Specification	AMD Athlon(tm) 64 X2 Dual Core Processor 3600+
Package	Socket AM2 (940)
Family/Model/Stepping	F.B.1
Extended Family/Model	F.6B
Brand ID	4
Core Stepping	BH-G1
Technology	65 nm
Core Speed	2698.1 MHz
Multiplier x Bus speed	9.5 x 284.0 MHz
HT Link speed	852.0 MHz
Stock frequency	1900 MHz
Instruction sets	MMX (+), 3DNow! (+), SSE, SSE2, SSE3, x86-64
L1 Data cache (per processor)	2 x 64 KBytes, 2-way set associative, 64-byte line size
L1 Instruction cache (per processor)	2 x 64 KBytes, 2-way set associative, 64-byte
line size
L2 cache (per processor)	2 x 512 KBytes, 16-way set associative, 64-byte line size