Why 16Mib static array size limit?

Mon Aug 15 18:28:05 PDT 2016

On 08/15/2016 12:09 PM, Ali Çehreli wrote:
> dmd does not allow anything larger.

Could you please help me understand the following results, possibly by 
analyzing the produced assembly?

I wanted to see whether there were any performance penalties when one 
used D's recommendation of using dynamic arrays beyond 16MiB.

Here is the test code:

enum size = 15 * 1024 * 1024;

version (STATIC) {
     ubyte[size] arr;
}
else {
     ubyte[] arr;

     static this() {
         arr = new ubyte[](size);
     }
}

void main() {
     auto p = arr.ptr;

     foreach (j; 0 .. 100) {
         foreach (i; 0..arr.length) {
             version (POINTER) {
                 p[i] += cast(ubyte)i;
             }
             else {
                 arr[i] += cast(ubyte)i;
             }
         }
     }
}

My CPU is an i7 with 4M cache:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping:              3
CPU MHz:               513.953
CPU max MHz:           3400.0000
CPU min MHz:           400.0000
BogoMIPS:              5615.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K

I tried two compilers:

- DMD64 D Compiler v2.071.2-b2

- LDC - the LLVM D compiler (1.0.0):
    based on DMD v2.070.2 and LLVM 3.8.0

As seen in the code, I tried two version identifiers:

-  STATIC: Use static array
-    else: Use dynamic array

- POINTER: Access array elements through .ptr
-    else: Access array elements through the [] operator

So, that gave me 8 combinations. Below, I list both the compilation 
command lines that I used and the wallclock times that each program 
execution took (as reported by the 'time' utility).

1) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC 
-version=POINTER

    4.332s

2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC

    4.238s

3) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=POINTER

    4.321s

4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline

    3.845s  <== BEST for dmd

5) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off 
-d-version=POINTER -d-version=STATIC

    0.469s

6) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=STATIC

   0.472s

7) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=POINTER

   0.182s  <== BEST for ldc2

8) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off

   0.792s

So, for dmd, going with the recommendation of using a dynamic array is 
faster. Interestingly, using .ptr is actually slower. How?

With ldc2, the best option is to go with a dynamic array ONLY IF you 
access the elements through the .ptr property. As seen in the last 
result, using the [] operator on the array is about 4 times slower than 
that.

Does that make sense to you? Why would that be?

Thank you,
Ali