Easiest way to use FMA instruction
Johan
j at j.nl
Fri Jan 10 00:02:52 UTC 2020
On Thursday, 9 January 2020 at 22:50:37 UTC, Ben Jones wrote:
> On Thursday, 9 January 2020 at 20:57:10 UTC, Ben Jones wrote:
>> What's the easiest way to use the FMA instruction (fused
>> multiply add that has nice rounding properties)? The FMA
>> function in Phobos just does a*b +c which will round twice.
>>
>> Do any of the intrinsics libraries include this? Should I
>> write my own inline ASM?
Why do you want to use the FMA instruction?
If for performance:
Inline assembly is generally very bad for performance as it
disables inlining and the compiler probably does not understand
the instruction itself (hence cannot combine it with other
optimizations). In this case you don't necessarily need the FMA
instruction (instead you want whatever instruction is fastest),
so you shouldn't force the compiler to use that instruction. Have
a look at https://github.com/AuburnSounds/intel-intrinsics, FMA
is not supported yet.
If only for the rounding behavior:
Then indeed you need to force the compiler to use the FMA
instruction (also for non-optimized code, so cannot rely on
optimizer). Inline assembly is a solution. GDC and LDC provide a
better inline assembly method that preserves a.o. inlining
potential and doesn't require hardcoded ABI details.
For LDC:
```
double fma(double a, double b, double c)
{
import ldc.llvmasm;
return __irEx!(
`declare double @llvm.fma.f64(double %a, double %b,
double %c)`,
`%r = call double @llvm.fma.f64(double %0, double
%1, double %2)
ret double %r`,
"",
double, double, double, double)(a,b,c);
}
```
https://wiki.dlang.org/LDC_inline_IR , but it is a little
outdated, see https://github.com/ldc-developers/ldc/issues/3271
cheers,
Johan
More information about the Digitalmars-d-learn
mailing list