Performance issue with @fastmath and vectorization
deXtoRious via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sat Nov 12 02:56:20 PST 2016
On Saturday, 12 November 2016 at 10:47:42 UTC, Johan Engelen
wrote:
> On Saturday, 12 November 2016 at 10:27:53 UTC, deXtoRious wrote:
>>
>> There are three vfmadd231ss in the entire assembly, but none
>> of them are in the inner loop. The presence of any FMA
>> instructions at all does show that the compiler properly
>> accepts the -mcpu switch, but it doesn't seem to recognize the
>> opportunities present in the inner loop.
>
> Does the C++ need `__restrict__` for the parameters to get the
> assembly you want?
In this case, it doesn't seem to make any difference. It is
habitual for me to use __restrict__ whenever possible in HPC
code, but very often Clang/GCC are smart enough nowadays to make
the inference regardless.
On that note, I was under the impression that D arrays included
the no aliasing assumption. If that's not the case, is there a
way to achieve the equivalent of __restrict__ in D?
>
>> The assembly generated by the godbolt service seems largely
>> identical to the one I got on my local machine.
>
> It is easier for the discussion if you paste godbolt.org links
> btw, so we don't have to manually do it ourselves ;-)
>
> -Johan
Will do. :)
By the way, I posted that issue on GH:
https://github.com/ldc-developers/ldc/issues/1874
More information about the digitalmars-d-ldc
mailing list