DMD 1.034 and 2.018 releases
dsimcha
dsimcha at yahoo.com
Sat Aug 9 07:46:19 PDT 2008
== Quote from bearophile (bearophileHUGS at lycos.com)'s article
> First benchmark, just D against itself, not used GCC yet, the results show that
vector ops are generally slower, but maybe there's some bug/problem in my
benchmark (note it needs just Phobos!), not tested on Linux yet:
I see at least part of the problem. When you use such huge arrays, it ends up
being more a test of your memory bandwidth than of the vector ops. Three arrays
of 80000 ints comes to a total of about 960k. This is not going to fit in any L1
cache for a long time. Heck, my CPU only has 512k L2 cache per core. Here are my
results using smaller arrays designed to fit in my 64k L1 data cache, and the same
code as Bearophile.
+ operator:
D:\code>array_benchmark.exe 500 1000000 0
array len= 4000 nloops= 1000000 Use vec ops: false
time= 4.82841 s
D:\code>array_benchmark.exe 500 1000000 1
array len= 4000 nloops= 1000000 Use vec ops: true
time= 2.32902 s
* operator :
D:\code>array_benchmark.exe 500 1000000 0
array len= 4000 nloops= 1000000 Use vec ops: false
time= 6.1556 s
D:\code>array_benchmark.exe 500 1000000 1
array len= 4000 nloops= 1000000 Use vec ops: true
time= 6.16539 s
/ operator:
D:\code>array_benchmark.exe 500 100000 0
array len= 4000 nloops= 100000 Use vec ops: false
time= 7.02435 s
D:\code>array_benchmark.exe 500 100000 1
array len= 4000 nloops= 100000 Use vec ops: true
time= 6.84251 s
BTW, for the sake of comparison, here's my CPU specs from CPU-Z. Also note that
I'm running in 32-bit mode.
Number of processors 1
Number of cores 2 per processor
Number of threads 2 (max 2) per processor
Name AMD Athlon 64 X2 3600+
Code Name Brisbane
Specification AMD Athlon(tm) 64 X2 Dual Core Processor 3600+
Package Socket AM2 (940)
Family/Model/Stepping F.B.1
Extended Family/Model F.6B
Brand ID 4
Core Stepping BH-G1
Technology 65 nm
Core Speed 2698.1 MHz
Multiplier x Bus speed 9.5 x 284.0 MHz
HT Link speed 852.0 MHz
Stock frequency 1900 MHz
Instruction sets MMX (+), 3DNow! (+), SSE, SSE2, SSE3, x86-64
L1 Data cache (per processor) 2 x 64 KBytes, 2-way set associative, 64-byte line size
L1 Instruction cache (per processor) 2 x 64 KBytes, 2-way set associative, 64-byte
line size
L2 cache (per processor) 2 x 512 KBytes, 16-way set associative, 64-byte line size
More information about the Digitalmars-d-announce
mailing list