dmd codegen improvements

Wed Aug 19 03:25:13 PDT 2015

On Wednesday, 19 August 2015 at 10:16:18 UTC, Ola Fosheim Grøstad 
wrote:
> On Wednesday, 19 August 2015 at 10:08:48 UTC, ponce wrote:
>> Even in video codec, AVX2 is not that useful and barely brings 
>> a 10% improvements over SSE, while being extra careful with 
>> SSE-AVX transition penalty. And to reap this benefit you would 
>> have to write in intrinsics/assembly.
>
> Masked AVX instructions are turned into NOPs. So you can remove 
> conditionals from inner loops. Performance of new instructions 
> tend to improve generation by generation.

Loops in video coding already have no conditional. And for the 
one who have, conditionals were already removeable with existing 
instructions.

>> For AVX-512 I can't even imagine what to use such large 
>> register for. Larger registers => more spilling because of 
>> calling conventions, and more fiddling around with complicated 
>> shuffle instructions. There is a steep diminishing returns 
>> with increasing registers size.
>
> You have to plan your data layout. Which is why libraries 
> should target it, so end users don't have to think too much 
> about it. If your computations are trivial, then you are 
> essentially memory I/O limited. SOA processing isn't really 
> limited by shuffling. Stuff like mapping a pure function over a 
> collection of arrays.

I stand by what I know and measured: previously few things are 
speed up by AVX-xxx. It almost always better investing this time 
to optimize somewhere else.