Performance issue with @fastmath and vectorization
deXtoRious via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sat Nov 12 08:40:27 PST 2016
On Saturday, 12 November 2016 at 16:29:20 UTC, Johan Engelen
wrote:
> On Saturday, 12 November 2016 at 15:44:28 UTC, deXtoRious wrote:
>>
>> I have not found any way to make LDC perform the same
>> optimizations as Clang's best case (simply static void, no
>> weak attribute) and have run out of ideas. Furthermore, I have
>> no idea why the aforementioned changes in the function
>> declaration affect the both optimizers in this way, or whether
>> finer control over vectorization/loop unrolling is possible in
>> LDC. Any thoughts?
>
> I think that perhaps when inlining the fastmath function, some
> optimization attributes are lost somehow and the inlined code
> is not optimized as much (you'd have to specify @fastmath on
> main too).
>
> It'd be easier to compare with -ffast-math I guess ;-)
>
> A look at the generated LLVM IR may provide some clues.
I tried putting @fastmath on main as well, it makes no difference
whatsoever (identical generated assembly). Apart from the
weirdness with weak/static making way more difference than I
would intuitively expect, it seems the major factor preventing
performance parity with Clang is the conservative loop
optimizations. Is there a way, similar to #pragma unroll in
Clang, to tell LDC to try to unroll the inner loop?
More information about the digitalmars-d-ldc
mailing list