Any usable SIMD implementation?
9il via Digitalmars-d
digitalmars-d at puremagic.com
Thu Apr 7 00:59:50 PDT 2016
On Thursday, 7 April 2016 at 03:27:31 UTC, Walter Bright wrote:
>
> I can understand that it might be demotivating for you, but
> that is not a blocker. A blocker has no reasonable workaround.
> This has a trivial workaround:
>
> gdc -simd=AFX foo.d
>
> becomes:
>
> gdc -simd=AFX -version=AFX foo.d
>
> It's even simpler if you use a makefile variable:
>
> FPU=AFX
>
> gdc -simd=$(FPU) -version=$(FPU)
ldc -mcpu=native
becomes:
????
>
> I still don't see how it is a problem to do the switch at a
> high level. Heck, you could put the ENTIRE ENGINE inside a
> template, have a template parameter be the instruction set, and
> instantiate the template for each supported instruction set.
>
> Then,
>
> void app(int simd)() { ... my fabulous app ... }
>
> int main() {
> auto fpu = core.cpuid.getfpu();
> switch (fpu) {
> case SIMD: app!(SIMD)(); break;
> case SIMD4: app!(SIMD4)(); break;
> default: error("unsupported FPU"); exit(1);
> }
> }
1. Executable size will grow with every instruction set release
2. BLAS already has big executable size
And main:
3. This would not solve the problem for generic BLAS
implementation for Phobos at all! How you would force compiler to
USE and NOT USE specific vector permutations for example in the
same object file? Yes, I know, DMD has not permutations. No, I
don't want to write permutation for each architecture. Why? I can
write simple D code that generates single LLVM IR code which
would work for ALL targets!
Best regards,
Ilya
More information about the Digitalmars-d
mailing list