SIMD implementation of dot-product. Benchmarks
Ilya Yaroshenko
ilyayaroshenko at gmail.com
Sat Aug 17 21:39:09 PDT 2013
On Sunday, 18 August 2013 at 01:53:53 UTC, Manu wrote:
> It doesn't look like you account for alignment.
> This is basically not-portable (I doubt unaligned loads in this
> context are
> faster than performing scalar operations), and possibly
> inefficient on x86
> too.
dotProduct uses unaligned loads (__builtin_ia32_loadups256,
__builtin_ia32_loadupd256) and it up to 21 times faster then
trivial scalar version.
Why unaligned loads is not-portable and inefficient?
> To make it account for potentially random alignment will be
> awkward, but it
> might be possible to do efficiently.
Did you mean use unaligned loads or prepare data for alignment
loads at the beginning of function?
More information about the Digitalmars-d-announce
mailing list