dmd codegen improvements
ponce via Digitalmars-d
digitalmars-d at puremagic.com
Wed Aug 19 03:25:13 PDT 2015
On Wednesday, 19 August 2015 at 10:16:18 UTC, Ola Fosheim Grøstad
wrote:
> On Wednesday, 19 August 2015 at 10:08:48 UTC, ponce wrote:
>> Even in video codec, AVX2 is not that useful and barely brings
>> a 10% improvements over SSE, while being extra careful with
>> SSE-AVX transition penalty. And to reap this benefit you would
>> have to write in intrinsics/assembly.
>
> Masked AVX instructions are turned into NOPs. So you can remove
> conditionals from inner loops. Performance of new instructions
> tend to improve generation by generation.
Loops in video coding already have no conditional. And for the
one who have, conditionals were already removeable with existing
instructions.
>> For AVX-512 I can't even imagine what to use such large
>> register for. Larger registers => more spilling because of
>> calling conventions, and more fiddling around with complicated
>> shuffle instructions. There is a steep diminishing returns
>> with increasing registers size.
>
> You have to plan your data layout. Which is why libraries
> should target it, so end users don't have to think too much
> about it. If your computations are trivial, then you are
> essentially memory I/O limited. SOA processing isn't really
> limited by shuffling. Stuff like mapping a pure function over a
> collection of arrays.
I stand by what I know and measured: previously few things are
speed up by AVX-xxx. It almost always better investing this time
to optimize somewhere else.
More information about the Digitalmars-d
mailing list