toy windowing auto-vec miss
Bruce Carneal
bcarneal at gmail.com
Mon Nov 7 14:46:51 UTC 2022
On Monday, 7 November 2022 at 09:56:13 UTC, rikki cattermole
wrote:
> This might be a bit naive, but ldc's output is about a quarter
> smaller, it uses significantly less jumps.
>
> Is gdc actually faster?
If you have long enough inputs, yes. A vectorized version
overcomes the instruction stream overhead quickly after which the
performance advantage trends to N/1.
As you imply, measurement trumps in-ones-head modelling. I'll
measure and report on the exact toy code later today but real
world code with the same "simple but not trivial" operand
pattern, involving Bayer/CFA data, has been measured and the
performance gap verified. For that code the workaround was
manual __vector-ization and use of a shuffle intrinsic.
More information about the digitalmars-d-ldc
mailing list