[Issue 4393] Very good dotProduct

d-bugmail at puremagic.com d-bugmail at puremagic.com
Thu Apr 28 21:40:27 PDT 2011


http://d.puremagic.com/issues/show_bug.cgi?id=4393


Don <clugdbug at yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug at yahoo.com.au


--- Comment #4 from Don <clugdbug at yahoo.com.au> 2011-04-28 21:36:38 PDT ---
Did you test this on Intel, or AMD? Blas1 code is generally limited by memory
access, and AMD has two load ports, so it has different bottlenecks and in
these operations always does better than Intel.

See also a discussion at:
http://www.bealto.com/mp-dot_sse.html
(I've talked to Eric Bealto before, he's happy for Phobos to use any of his
stuff if we see anything we want). It's a bit misleading, though, because above
a certain length, you become dominated by cache effects, so I don't know if
unrolling by 4 is actually worthwhile in practice.

I also have some optimized x87 dot product code (AMD 32 bit CPUs don't have
SSE2, so they still need x87 for doubles).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list