Performance issue with @fastmath and vectorization
LiNbO3 via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sat Nov 12 01:45:29 PST 2016
On Saturday, 12 November 2016 at 00:03:16 UTC, dextorious wrote:
> On my Haswell i7-4710HQ machine the C++ version runs in
> ~10ms/iteration while the D code takes 25ms. Comparing profiler
> output with the generated assembly code quickly reveals the
> reason - while Clang fully unrolls the inner loop and uses FMA
> instructions wherever possible, the inner loop assembly
> produced by LDC looks like this:
By compiling your code with the same set of flags you used on the
godbolt (https://d.godbolt.org/) service I do see the FMA
instructions being used.
More information about the digitalmars-d-ldc
mailing list