Optimizing for SIMD: best practices?(i.e. what features are allowed?)
tsbockman
thomas.bockman at gmail.com
Sun Mar 7 23:06:30 UTC 2021
On Sunday, 7 March 2021 at 18:00:57 UTC, z wrote:
> On Friday, 26 February 2021 at 03:57:12 UTC, tsbockman wrote:
>>>> static foreach(size_t i; 0 .. 3/+typeof(a).length+/){
>>>> distance += a[i].abs;//abs required by the caller
>>
>> (a * a) above is always positive for real numbers. You don't
>> need the call to abs unless you're trying to guarantee that
>> even nan values will have a clear sign bit.
>>
> I do not know why but the caller's performance nosedives
My way is definitely (slightly) better; something is going wrong
in either the caller, or the optimizer. Show me the code for the
caller and maybe I can figure it out.
> whenever there is no .abs at this particular line.(there's a 3x
> difference, no joke.)
Perhaps the compiler is performing a value range propagation
(VRP) based optimization in the caller, but isn't smart enough to
figure out that the value is already always positive without the
`abs` call? I've run into that specific problem before.
Alternatively, sometimes trivial changes to the code that
*shouldn't* matter make the difference between hitting a smart
path in the optimizer, and a dumb one. Automatic SIMD
optimization is quite sensitive and temperamental.
Either way, the problem can be fixed by figuring out what
optimization the compiler is doing when it knows that distance is
positive, and just doing it manually.
> Same for assignment instead of addition, but with a 2x
> difference instead.
Did you fix the nan bug I pointed out earlier? More generally,
are you actually verifying the correctness of the results in any
way for each alternative implementation? Because you can get big
speedups sometimes from buggy code when the compiler realizes
that some later logic error makes earlier code irrelevant, but
that doesn't mean the buggy code is better...
More information about the Digitalmars-d-learn
mailing list