Auto-Vectorization and array/vector operations

Wed Jul 15 15:42:03 PDT 2015

I was trying to show someone how awesome Dlang was earlier, and 
how the vector operations are expected to take advantage of the 
CPU vector instructions, and was dumbstruck when dmd and gdc both 
failed to auto-vectorize a simple case.  I've stripped it down to 
the bare minimum and loaded the example on the interactive 
compiler: 
http://asm.dlang.org/#%7B%22version%22%3A3%2C%22filterAsm%22%3A%7B%7D%2C%22compilers%22%3A%5B%7B%22sourcez%22%3A%22JYWwDg9gTgLgBAY2gUwHQGdQBMDcAoPAMwBsIBDGAbQCYBWANgF05kAPM8Y5AQQAoTyVOkzhkANHAEUaDZgCMAlHgDeeOJNLThzBPnUB6fXCjJ0AV2Ix0cYADs45uenRrElZgF5R7uAFo4cu56xsgwZlD2ungAvgRSQrIs7JzIAEL8mgki4hqCMiKKKq7xAByUAMzUjABuZHBeCGToMBmCZZWMCmTBpRVV1XL1iE0tvR0Kcj2Z7f1R6q6GIeaW1nYOZnJgLurVCD5etT7%2BA0EE6iZhEcPNrVqyCrv40UAAA%3D%22%2C%22compiler%22%3A%22dmd2067%22%2C%22options%22%3A%22-O%20-release%20-inline%20-boundscheck%3Doff%22%7D%5D%7D

The reference documentation for arrays says:
Implementation note: many of the more common vector operations 
are expected to take advantage of any vector math instructions 
available on the target computer.

Does this mean that while compilers are expected to take 
advantage of them, they currently do not, even when they have 
proper alignment?  I haven't tried LDC yet, so maybe LDC does 
perform auto-vectorization and I should attempt to use LDC if I 
plan on using vector ops a lot?

import core.simd;

float[256] exampleA(float[256] a, float[256] b)
{
   float[256] c;
   // results in subss (scalar instruction)
   c[] = a[] - b[];
   return c;
}

float[256] exampleB(float[256] a, float[256] b)
{
   float8[32]va = cast(float8[32])a;
   float8[32]vb = cast(float8[32])b;
   float8[32]vc;

   // results in subps (vector instruction)
   vc[] = va[] - vb[];

   return cast(float[256])vc;
}