Performance issue with @fastmath and vectorization
Johan Engelen via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sat Nov 12 02:47:42 PST 2016
On Saturday, 12 November 2016 at 10:27:53 UTC, deXtoRious wrote:
>
> There are three vfmadd231ss in the entire assembly, but none of
> them are in the inner loop. The presence of any FMA
> instructions at all does show that the compiler properly
> accepts the -mcpu switch, but it doesn't seem to recognize the
> opportunities present in the inner loop.
Does the C++ need `__restrict__` for the parameters to get the
assembly you want?
> The assembly generated by the godbolt service seems largely
> identical to the one I got on my local machine.
It is easier for the discussion if you paste godbolt.org links
btw, so we don't have to manually do it ourselves ;-)
-Johan
More information about the digitalmars-d-ldc
mailing list