Performance issue with @fastmath and vectorization

Sat Nov 12 10:55:19 PST 2016

On Saturday, 12 November 2016 at 16:40:27 UTC, deXtoRious wrote:
> 
> I tried putting @fastmath on main as well, it makes no 
> difference whatsoever (identical generated assembly).

Yeah I saw it too. It's a bit strange.

> Apart from the weirdness with weak/static making way more 
> difference than I would intuitively expect,

I am also surprised but: adding `static` in C++ makes it a fully 
private function, which does not need to be emitted as such (and 
isn't in your case, because it is fully inlined).
I added `pragma(inline, true)` to the D function to get a similar 
effect, I hoped.

> it seems the major factor preventing performance parity with 
> Clang is the conservative loop optimizations. Is there a way, 
> similar to #pragma unroll in Clang, to tell LDC to try to 
> unroll the inner loop?

There isn't at the moment. We need a mechanism to tag statements 
with such metadata. In LLVM IR, this is what you'd want: 
http://llvm.org/docs/LangRef.html#llvm-loop
I am not enough of a D expert to come up with a good way to do 
this. Perhaps David can help come up with a solution?
Good stuff for another Github issue! ;-)