Performance issue with @fastmath and vectorization

Johan Engelen via digitalmars-d-ldc digitalmars-d-ldc at puremagic.com
Sat Nov 12 02:47:42 PST 2016


On Saturday, 12 November 2016 at 10:27:53 UTC, deXtoRious wrote:
>
> There are three vfmadd231ss in the entire assembly, but none of 
> them are in the inner loop. The presence of any FMA 
> instructions at all does show that the compiler properly 
> accepts the -mcpu switch, but it doesn't seem to recognize the 
> opportunities present in the inner loop.

Does the C++ need `__restrict__` for the parameters to get the 
assembly you want?

> The assembly generated by the godbolt service seems largely 
> identical to the one I got on my local machine.

It is easier for the discussion if you paste godbolt.org links 
btw, so we don't have to manually do it ourselves ;-)

-Johan



More information about the digitalmars-d-ldc mailing list