Rather Bizarre slow downs using Complex!float with avx (ldc).
james.p.leblanc
james.p.leblanc at gmail.com
Thu Sep 30 16:40:03 UTC 2021
D-Ers,
I have been getting counterintuitive results on avx/no-avx timing
experiments. Storyline to date (notes at end):
**Experiment #1)** Real float data type (i.e. non-complex
numbers),
speed comparison.
a) moving from non-avx --> avx shows non-realistic speed up of
15-25 X.
b) this is weird, but story continues ...
**Experiment #2)** Real double data type (non-complex numbers),
a) moving from non-avx --> avx again shows amazing gains, but
the
gains are about half of those seen in Experiment #1, so
maybe
this looks plausible?
**Experiment #3)** Complex!float datatypes:
a) now **going from non-avx to avx shows a serious performance
LOSS**
of 40% to breaking even at best. What is happening here?
**Experiment #4)** Complex!double:
a) non-avx --> avx shows performancegains again about 2X (so
the
gains appear to be reasonable).
The main question I have is:
**"What is going on with the Complex!float performance?"** One
might expect
floats to have a better perfomance than doubles as we saw with the
real-value data (becuase of vector packaging, memory bandwidth,
etc).
But, **Complex!float shows MUCH WORSE avx performance than
Complex!Double
(by a factor of almost 4).**
```d
// Table of Computation Times
//
// self math std math
// explicit no-explicit explicit no-explicit
// align align align align
// 0.12 0.21 0.15 0.21 ; # Float with AVX
// 3.23 3.24 3.30 3.22 ; # Float without
AVX
// 0.31 0.42 0.31 0.42 ; # Double with AVX
// 3.25 3.24 3.24 3.27 ; # Double without
AVX
// 6.42 6.62 6.61 6.59 ; # Complex!float
with AVX
// 4.04 4.17 6.68 5.82 ; # Complex!float
without AVX
// 1.67 1.69 1.73 1.71 ; # Complex!double
with AVX
// 3.34 3.42 3.28 3.31 # Complex!double
without AVX
```
Notes:
1) Based on forum hints from ldc experts, I got good guidance
on enabling avx ( i.e. compiling modules on command line, using
--fast-math and -mcpu=haswell on command line).
2) From Mir-glas experts I received hints to try to implement own
version
of the complex math. (this is what the "self-math" column
refers to).
I understand that detail of the computations are not included
here, (I
can do that if there is interest, and if I figure out an
effective way to present
it in a forum.)
But, I thought I might begin with a simple question, **"Is there
some well-known
issue that I am missing here". Have others been done this road
as well?**
Thanks for any and all input.
Best Regards,
James
PS Sorry for the inelegant table ... I do not believe there is a
way
to include the beautiful bars charts on this forum. Please
correct me
if there is a way...)
More information about the Digitalmars-d-learn
mailing list