Optimizing for SIMD: best practices?(i.e. what features are allowed?)
tsbockman
thomas.bockman at gmail.com
Fri Feb 26 03:57:12 UTC 2021
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote:
>>float euclideanDistanceFixedSizeArray(float[3] a, float[3] b) {
Use __vector(float[4]), not float[3].
>> float distance;
The default value for float is float.nan. You need to explicitly
initialize it to 0.0f or something if you want this function to
actually do anything useful.
>> a[] -= b[];
>> a[] *= a[];
With __vector types, this can be simplified (not optimized) to
just:
a -= b;
a *= a;
>> static foreach(size_t i; 0 .. 3/+typeof(a).length+/){
>> distance += a[i].abs;//abs required by the caller
(a * a) above is always positive for real numbers. You don't need
the call to abs unless you're trying to guarantee that even nan
values will have a clear sign bit.
Also, there is no point to adding the first component to zero,
and copying element [0] from a SIMD register into a scalar is
free, so this can become:
float distance = a[0];
static foreach(size_t i; 1 .. 3)
distance += a[i];
>> }
>> return sqrt(distance);
>>}
Final assembly output (ldc 1.24.0 with -release -O3
-preview=intpromote -preview=dip1000 -m64 -mcpu=haswell
-fp-contract=fast -enable-cross-module-inlining):
vsubps xmm0, xmm1, xmm0
vmulps xmm0, xmm0, xmm0
vmovshdup xmm1, xmm0
vaddss xmm1, xmm0, xmm1
vpermilpd xmm0, xmm0, 1
vaddss xmm0, xmm0, xmm1
vsqrtss xmm0, xmm0, xmm0
ret
More information about the Digitalmars-d-learn
mailing list