dmd codegen improvements

Wed Aug 19 03:16:17 PDT 2015

On Wednesday, 19 August 2015 at 10:08:48 UTC, ponce wrote:
> Even in video codec, AVX2 is not that useful and barely brings 
> a 10% improvements over SSE, while being extra careful with 
> SSE-AVX transition penalty. And to reap this benefit you would 
> have to write in intrinsics/assembly.

Masked AVX instructions are turned into NOPs. So you can remove 
conditionals from inner loops. Performance of new instructions 
tend to improve generation by generation.

> For AVX-512 I can't even imagine what to use such large 
> register for. Larger registers => more spilling because of 
> calling conventions, and more fiddling around with complicated 
> shuffle instructions. There is a steep diminishing returns with 
> increasing registers size.

You have to plan your data layout. Which is why libraries should 
target it, so end users don't have to think too much about it. If 
your computations are trivial, then you are essentially memory 
I/O limited. SOA processing isn't really limited by shuffling. 
Stuff like mapping a pure function over a collection of arrays.