SIMD under LDC

Igor via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Sep 6 13:43:01 PDT 2017


On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:
> On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen 
> wrote:
>> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>>> I found that I can't use __simd function from core.simd under 
>>> LDC and that it has ldc.simd but I couldn't find how to 
>>> implement equivalent to this with it:
>>>
>>> ubyte16* masks = ...;
>>> foreach (ref c; pixels) {
>>> 	c = __simd(XMM.PSHUFB, c, *masks);
>>> }
>>>
>>> I see it has shufflevector function but it only accepts 
>>> constant masks and I am using a variable one. Is this 
>>> possible under LDC?
>>
>> You can use the module ldc.gccbuiltins_x86.di, 
>> __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>>
>> (also see 
>> https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>>
>> Please file a feature request about shufflevector with 
>> variable mask in our (LDC) issue tracker on Github; with some 
>> code that you'd expect to work. Thanks.
>>
>> - Johan
>
> I'll try that this evening. Thanks! I'll also open an issue but 
> are you sure such feature request is valid since LLVM 
> shufflevector instruction, as far as I see, only supports 
> constant masks as well.

I opened a feature request on github. I also tried using the 
gccbuiltins but I got this error:

LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 
0x2199c74e9a8, 0x2199c74d6c0
   0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90, 
Register:v16i8 %vreg384
     0x2199c96fb00: v16i8 = Register %vreg384
   0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90, 
Register:v16i8 %vreg385
     0x2199c74ed50: v16i8 = Register %vreg385
In function: _D7assetdb12loadBmpImageFAxaZf
Building x64\LDCDebug\DNgin.exe failed!

You can see the code I used here: 
https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123

Note that if you want to try it you will need a test.bmp in 
specific format where header.compression == 3, like this one: 
https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharing



More information about the Digitalmars-d-learn mailing list