toy windowing auto-vec miss

Bruce Carneal bcarneal at gmail.com
Mon Nov 7 14:46:51 UTC 2022


On Monday, 7 November 2022 at 09:56:13 UTC, rikki cattermole 
wrote:
> This might be a bit naive, but ldc's output is about a quarter 
> smaller, it uses significantly less jumps.
>
> Is gdc actually faster?

If you have long enough inputs, yes.  A vectorized version 
overcomes the instruction stream overhead quickly after which the 
performance advantage trends to N/1.

As you imply, measurement trumps in-ones-head modelling.  I'll 
measure and report on the exact toy code later today but real 
world code with the same "simple but not trivial" operand 
pattern, involving Bayer/CFA data, has been measured and the 
performance gap verified.  For that code the workaround was 
manual __vector-ization and use of a shuffle intrinsic.



More information about the digitalmars-d-ldc mailing list