Vector operations optimization.

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Mar 23 03:48:50 PDT 2012


On 23.03.2012 9:57, Comrad wrote:
> On Thursday, 22 March 2012 at 10:43:35 UTC, Trass3r wrote:
>>> What is the status at the moment? What compiler and with which
>>> compiler flags I should use to achieve maximum performance?
>>
>> In general gdc or ldc. Not sure how good vectorization is though, esp.
>> auto-vectorization.
>> On the other hand the so called vector operations like a[] = b[] +
>> c[]; are lowered to hand-written SSE assembly even in dmd.
>
> I had such a snippet to test:
>
> 1 import std.stdio;
> 2 void main()
> 3 {
> 4 double[2] a=[1.,0.];
> 5 double[2] a1=[1.,0.];
> 6 double[2] a2=[1.,0.];
> 7 double[2] a3=[0.,0.];

Here is a culprit, the array ops [] are tuned for arbitrary long(!) 
arrays, they are not plain 1 simd SEE op. They are handcrafted loops(!) 
on SSE ops, cool and fast for arrays in general, not fixed 
pairs/trios/etc. I believe it might change in future, if compiler is 
able to deduce that size is fixed, and use more optimal code for small 
sizes.

> 8 foreach(i;0..1000000000)
> 9 a3[]+=a[]+a1[]*a2[];
> 10 writeln(a3);
> 11 }
>
> And I compared with the following d code:
>
> 1 import std.stdio;
> 2 void main()
> 3 {
> 4 double[2] a=[1.,0.];
> 5 double[2] a1=[1.,0.];
> 6 double[2] a2=[1.,0.];
> 7 double[2] a3=[0.,0.];
> 8 foreach(i;0..1000000000)
> 9 {
> 10 a3[0]+=a[0]+a1[0]*a2[0];
> 11 a3[1]+=a[1]+a1[1]*a2[1];
> 12 }
> 13 writeln(a3);
> 14 }
>
> And with the following c code:
>
> 1 #include <stdio.h>
> 2 int main()
> 3 {
> 4 double a[2]={1.,0.};
> 5 double a1[2]={1.,0.};
> 6 double a2[2]={1.,0.};
> 7 double a3[2];
> 8 unsigned i;
> 9 for(i=0;i<1000000000;++i)
> 10 {
> 11 a3[0]+=a[0]+a1[0]*a2[0];
> 12 a3[1]+=a[1]+a1[1]*a2[1];
> 13 }
> 14 printf("%f %f\n",a3[0],a3[1]);
> 15 return 0;
> 16 }
>
> The last one I compiled with gcc two previous with dmd and ldc. C code
> with -O2
> was the fastest and as fast as d without slicing compiled with ldc. d
> code with slicing was 3 times slower (ldc compiler). I tried to compile
> with different optimization flags, that didn't help. Maybe I used the
> wrong ones. Can someone comment on this?


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list