Using SSE3 vector shuffel with LDC

Sun May 26 13:54:32 UTC 2019

On Sunday, 26 May 2019 at 12:10:30 UTC, KytoDragon wrote:
> I have been trying to port some programs to D that heavely use 
> SSE instructions.
> In particular, i still need _mm_shuffle_epi8, _mm_alignr_epi8 
> and _mm_aesdec_si128.
> LDC does not support the core.simd approach and ldc.simd only 
> supports a few operations, including a vector shuffel with a 
> fixed mask (I need a variable mask).
> So how would one go about using theese with LDC?
>
> I need to be able to:
> - consistently generate SSE instruction, even in debug builds.
> - inline the function.
>
> I have been unable to find a solution using either the simd 
> package, inline asm or inline llvm-ir.

There's https://github.com/AuburnSounds/intel-intrinsics which 
tries to be compatible with the Intel intrinsic names.

_mm_aesdec_si128 is available in ldc.gccbuiltins_x86 as 
__builtin_ia32_aesdec128; _mm_shuffle_epi8 as 
__builtin_ia32_pshufb128. Make sure to specify that the 
instructions are available via something like `-mattr=+ssse3` in 
the LDC command line.
I haven't found something corresponding to _mm_alignr_epi8, but 
inline asm can always be used. Here's an example for a manual 
__builtin_ia32_pshufb128 using LLVM inline assembly:

alias byte16 = __vector(byte[16]);

version (Manual)
{
     pragma(inline, true)
     byte16 _mm_shuffle_epi8(byte16 a, byte16 b)
     {
         import ldc.llvmasm;
         return __asm!byte16("pshufb $2, $1", "=x,0,x", a, b);
     }
}
else
{
     import ldc.gccbuiltins_x86 : _mm_shuffle_epi8 = 
__builtin_ia32_pshufb128;
}

void main()
{
     byte16 a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15 ];
     byte16 b = [ -1, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 
13, 12 ];
     const actual = _mm_shuffle_epi8(a, b);
     byte16 expected = b;
     expected[0] = 0;
     assert(actual == expected);
}