Performance issue with @fastmath and vectorization

Johan Engelen via digitalmars-d-ldc digitalmars-d-ldc at puremagic.com
Sat Nov 12 04:11:35 PST 2016


On Saturday, 12 November 2016 at 11:16:16 UTC, deXtoRious wrote:
> On Saturday, 12 November 2016 at 11:04:59 UTC, Johan Engelen 
> wrote:
>> On Saturday, 12 November 2016 at 10:56:20 UTC, deXtoRious 
>> wrote:
>>> On Saturday, 12 November 2016 at 10:47:42 UTC, Johan Engelen 
>>> wrote:
>>>>
>>>> Does the C++ need `__restrict__` for the parameters to get 
>>>> the assembly you want?
>>>
>>> In this case, it doesn't seem to make any difference.
>>
>> That's good news, because there is currently no way to add 
>> that to LDC code, afaik.
>
> I hope it's somewhere on the roadmap for the future, as it does 
> still make a measurable difference in some cases.

Can you file an issue for that too? (ideas in forum posts get 
lost instantly)
Make sure you add an (as small as possible) testcase that shows a 
clear difference in codegen (with/without for C++), and with 
worse codegen with D code without it.
It may be relatively easy to implement it in LDC, but I don't 
think many people know the intricacies of C's restrict. With 
examples of the effect it has on assembly (clang C++) helps a lot 
towards getting it implemented.

> -Ofast is also there out of habit, doesn't make a meaningful 
> difference for a benchmark as simple as this. Other switches, 
> like -fno-rtti, -fno-exceptions and even -flto can also be 
> dropped, simply using -O3 -march=native -ffast-math is 
> sufficient to outperform LDC by 2.5x, losing only about 10% 
> from the best C++ performance and producing essentially the 
> same unrolled FMA-enabled assembly with very minor changes.

OK great.
I think you ran into a compiler limitation somehow, so make sure 
you submit the simplified example/testcase on GH ! ;)
(the simpler you can make it, the better)

Btw, for benchmarking, you should mark the `compute_neq` function 
as "weak linkage", such that the compiler is not going to do 
inter-procedural optimization for the call to `compute_neq` in 
`main`. (@weak for LDC, clang probably something like 
__attribute__((weak)))



More information about the digitalmars-d-ldc mailing list