Any usable SIMD implementation?

Tue Apr 5 01:34:32 PDT 2016

On 4/4/2016 11:10 PM, 9il wrote:
> It is impossible to deduct from that combination that Xeon Phi has 32 FP registers.

Since dmd doesn't generate specific code for a Xeon Phi, having a compile time 
switch for it is meaningless.

> "Since the compiler never generates AVX or AVX2" - this is definitely nor true,
> see, for example, LLVM vectorization and SLP vectorization.

dmd is not LLVM.

>> It's entirely practical to compile code with different source code, link them
>> *both* into the executable, and switch between them based on runtime detection
>> of the CPU.
> This approach is complex,

Not at all. Used to do it all the time in the DOS world (FPU vs emulation).

> I just want an unified instrument to receive CT information about target and
> optimization switches. It is OK if this information would have different
> switches on different compilers.

Optimizations simply do not transfer from one compiler to another, whether the 
switch is the same or not. They are highly implementation dependent.

> Auto vectorization is only example (maybe bad). I would use SIMD vectors, but I
> need CT information about target CPU, because it is impossible to build optimal
> BLAS kernels without it!

I still don't understand why you cannot just set '-version=xxx' on the command 
line and then switch off that version in your custom code.