Performance issue with @fastmath and vectorization

Sat Nov 12 08:40:27 PST 2016

On Saturday, 12 November 2016 at 16:29:20 UTC, Johan Engelen 
wrote:
> On Saturday, 12 November 2016 at 15:44:28 UTC, deXtoRious wrote:
>>
>> I have not found any way to make LDC perform the same 
>> optimizations as Clang's best case (simply static void, no 
>> weak attribute) and have run out of ideas. Furthermore, I have 
>> no idea why the aforementioned changes in the function 
>> declaration affect the both optimizers in this way, or whether 
>> finer control over vectorization/loop unrolling is possible in 
>> LDC. Any thoughts?
>
> I think that perhaps when inlining the fastmath function, some 
> optimization attributes are lost somehow and the inlined code 
> is not optimized as much (you'd have to specify @fastmath on 
> main too).
>
> It'd be easier to compare with -ffast-math I guess ;-)
>
> A look at the generated LLVM IR may provide some clues.

I tried putting @fastmath on main as well, it makes no difference 
whatsoever (identical generated assembly). Apart from the 
weirdness with weak/static making way more difference than I 
would intuitively expect, it seems the major factor preventing 
performance parity with Clang is the conservative loop 
optimizations. Is there a way, similar to #pragma unroll in 
Clang, to tell LDC to try to unroll the inner loop?