Performance issue with @fastmath and vectorization
deXtoRious via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sat Nov 12 03:16:16 PST 2016
On Saturday, 12 November 2016 at 11:04:59 UTC, Johan Engelen
wrote:
> On Saturday, 12 November 2016 at 10:56:20 UTC, deXtoRious wrote:
>> On Saturday, 12 November 2016 at 10:47:42 UTC, Johan Engelen
>> wrote:
>>>
>>> Does the C++ need `__restrict__` for the parameters to get
>>> the assembly you want?
>>
>> In this case, it doesn't seem to make any difference.
>
> That's good news, because there is currently no way to add that
> to LDC code, afaik.
I hope it's somewhere on the roadmap for the future, as it does
still make a measurable difference in some cases.
>
> Hope you can try to cut more of these things from the example
> so it's easier to figure out why things are different. (e.g.
> is -Ofast needed, or is -O3 enough?)
>
> Thanks!
>
> cheers,
> Johan
-Ofast is also there out of habit, doesn't make a meaningful
difference for a benchmark as simple as this. Other switches,
like -fno-rtti, -fno-exceptions and even -flto can also be
dropped, simply using -O3 -march=native -ffast-math is sufficient
to outperform LDC by 2.5x, losing only about 10% from the best
C++ performance and producing essentially the same unrolled
FMA-enabled assembly with very minor changes.
More information about the digitalmars-d-ldc
mailing list