Rather Bizarre slow downs using Complex!float with avx (ldc).

Thu Sep 30 16:40:03 UTC 2021

D-Ers,

I have been getting counterintuitive results on avx/no-avx timing
experiments.  Storyline to date (notes at end):

**Experiment #1)** Real float data type (i.e. non-complex 
numbers),
speed comparison.
   a)  moving from non-avx --> avx shows non-realistic speed up of 
15-25 X.
   b)  this is weird, but story continues ...

**Experiment #2)** Real double data type (non-complex numbers),
   a)  moving from non-avx --> avx again shows amazing gains, but 
the
       gains are about half of those seen in Experiment #1, so 
maybe
       this looks plausible?

**Experiment #3)**  Complex!float datatypes:
   a)  now **going from non-avx to avx shows a serious performance 
LOSS**
       of 40% to breaking even at best.  What is happening here?

**Experiment #4)**  Complex!double:
   a)  non-avx --> avx shows performancegains again about 2X (so 
the
       gains appear to be reasonable).

The main question I have is:

**"What is going on with the Complex!float performance?"**  One 
might expect
floats to have a better perfomance than doubles as we saw with the
real-value data (becuase of vector packaging, memory bandwidth, 
etc).

But, **Complex!float shows MUCH WORSE avx performance than 
Complex!Double
(by a factor of almost 4).**

```d
//            Table of Computation Times
//
//       self math              std math
// explicit  no-explicit   explicit  no-explicit
//   align      align        align      align
//   0.12       0.21          0.15      0.21 ;  # Float with AVX
//   3.23       3.24          3.30      3.22 ;  # Float without 
AVX
//   0.31       0.42          0.31      0.42 ;  # Double with AVX
//   3.25       3.24          3.24      3.27 ;  # Double without 
AVX
//   6.42       6.62          6.61      6.59 ;  # Complex!float 
with AVX
//   4.04       4.17          6.68      5.82 ;  # Complex!float 
without AVX
//   1.67       1.69          1.73      1.71 ;  # Complex!double 
with AVX
//   3.34       3.42          3.28      3.31    # Complex!double 
without AVX
```

Notes:

1) Based on forum hints from ldc experts, I got good guidance
    on enabling avx ( i.e. compiling modules on command line, using
    --fast-math and -mcpu=haswell on command line).

2) From Mir-glas experts I received hints to try to implement own 
version
    of the complex math.  (this is what the "self-math" column 
refers to).

I understand that detail of the computations are not included 
here, (I
can do that if there is interest, and if I figure out an 
effective way to present
it in a forum.)

But, I thought I might begin with a simple question, **"Is there 
some well-known
issue that I am missing here".  Have others been done this road 
as well?**

Thanks for any and all input.
Best Regards,
James

PS  Sorry for the inelegant table ... I do not believe there is a 
way
to include the beautiful bars charts on this forum.  Please 
correct me
if there is a way...)