Any usable SIMD implementation?

Thu Apr 7 00:59:50 PDT 2016

On Thursday, 7 April 2016 at 03:27:31 UTC, Walter Bright wrote:
>
> I can understand that it might be demotivating for you, but 
> that is not a blocker. A blocker has no reasonable workaround. 
> This has a trivial workaround:
>
>    gdc -simd=AFX foo.d
>
> becomes:
>
>    gdc -simd=AFX -version=AFX foo.d
>
> It's even simpler if you use a makefile variable:
>
>     FPU=AFX
>
>     gdc -simd=$(FPU) -version=$(FPU)

     ldc -mcpu=native

becomes:

      ????
>
> I still don't see how it is a problem to do the switch at a 
> high level. Heck, you could put the ENTIRE ENGINE inside a 
> template, have a template parameter be the instruction set, and 
> instantiate the template for each supported instruction set.
>
> Then,
>
>     void app(int simd)() { ... my fabulous app ... }
>
>     int main() {
>       auto fpu = core.cpuid.getfpu();
>       switch (fpu) {
>         case SIMD: app!(SIMD)(); break;
>         case SIMD4: app!(SIMD4)(); break;
>         default: error("unsupported FPU"); exit(1);
>       }
>     }

1. Executable size will grow with every instruction set release
2. BLAS already has big executable size
And main:
3. This would not solve the problem for generic BLAS 
implementation for Phobos at all! How you would force compiler to 
USE and NOT USE specific vector permutations for example in the 
same object file? Yes, I know, DMD has not permutations. No, I 
don't want to write permutation for each architecture. Why? I can 
write simple D code that generates single LLVM IR code which 
would work for ALL targets!

Best regards,
Ilya