SIMD implementation of dot-product. Benchmarks
Ilya Yaroshenko
ilyayaroshenko at gmail.com
Sat Aug 17 22:31:25 PDT 2013
On Sunday, 18 August 2013 at 05:26:00 UTC, Manu wrote:
> movups is not good. It'll be a lot faster (and portable) if you
> use movaps.
>
> Process looks something like:
> * do the first few from a[0] until a's alignment interval as
> scalar
> * load the left of b's aligned pair
> * loop for each aligned vector in a
> - load a[n..n+4] aligned
> - load the right of b's pair
> - combine left~right and shift left to match elements
> against a
> - left = right
> * perform stragglers as scalar
>
> Your benchmark is probably misleading too, because I suspect
> you are
> passing directly alloc-ed arrays into the function (which are
> 16 byte
> aligned).
> movups will be significantly slower if the pointers supplied
> are not 16
> byte aligned.
> Also, results vary significantly between chip manufacturers and
> revisions.
I`ll try =). Thanks you very math!
More information about the Digitalmars-d-announce
mailing list