Why 16Mib static array size limit?
Ali Çehreli via Digitalmars-d
digitalmars-d at puremagic.com
Mon Aug 15 18:28:05 PDT 2016
On 08/15/2016 12:09 PM, Ali Çehreli wrote:
> dmd does not allow anything larger.
Could you please help me understand the following results, possibly by
analyzing the produced assembly?
I wanted to see whether there were any performance penalties when one
used D's recommendation of using dynamic arrays beyond 16MiB.
Here is the test code:
enum size = 15 * 1024 * 1024;
version (STATIC) {
ubyte[size] arr;
}
else {
ubyte[] arr;
static this() {
arr = new ubyte[](size);
}
}
void main() {
auto p = arr.ptr;
foreach (j; 0 .. 100) {
foreach (i; 0..arr.length) {
version (POINTER) {
p[i] += cast(ubyte)i;
}
else {
arr[i] += cast(ubyte)i;
}
}
}
}
My CPU is an i7 with 4M cache:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 78
Model name: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping: 3
CPU MHz: 513.953
CPU max MHz: 3400.0000
CPU min MHz: 400.0000
BogoMIPS: 5615.89
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
I tried two compilers:
- DMD64 D Compiler v2.071.2-b2
- LDC - the LLVM D compiler (1.0.0):
based on DMD v2.070.2 and LLVM 3.8.0
As seen in the code, I tried two version identifiers:
- STATIC: Use static array
- else: Use dynamic array
- POINTER: Access array elements through .ptr
- else: Access array elements through the [] operator
So, that gave me 8 combinations. Below, I list both the compilation
command lines that I used and the wallclock times that each program
execution took (as reported by the 'time' utility).
1) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC
-version=POINTER
4.332s
2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC
4.238s
3) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=POINTER
4.321s
4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline
3.845s <== BEST for dmd
5) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off
-d-version=POINTER -d-version=STATIC
0.469s
6) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=STATIC
0.472s
7) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=POINTER
0.182s <== BEST for ldc2
8) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off
0.792s
So, for dmd, going with the recommendation of using a dynamic array is
faster. Interestingly, using .ptr is actually slower. How?
With ldc2, the best option is to go with a dynamic array ONLY IF you
access the elements through the .ptr property. As seen in the last
result, using the [] operator on the array is about 4 times slower than
that.
Does that make sense to you? Why would that be?
Thank you,
Ali
More information about the Digitalmars-d
mailing list