SIMD under LDC
Igor via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Sep 6 13:43:01 PDT 2017
On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:
> On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen
> wrote:
>> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>>> I found that I can't use __simd function from core.simd under
>>> LDC and that it has ldc.simd but I couldn't find how to
>>> implement equivalent to this with it:
>>>
>>> ubyte16* masks = ...;
>>> foreach (ref c; pixels) {
>>> c = __simd(XMM.PSHUFB, c, *masks);
>>> }
>>>
>>> I see it has shufflevector function but it only accepts
>>> constant masks and I am using a variable one. Is this
>>> possible under LDC?
>>
>> You can use the module ldc.gccbuiltins_x86.di,
>> __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>>
>> (also see
>> https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>>
>> Please file a feature request about shufflevector with
>> variable mask in our (LDC) issue tracker on Github; with some
>> code that you'd expect to work. Thanks.
>>
>> - Johan
>
> I'll try that this evening. Thanks! I'll also open an issue but
> are you sure such feature request is valid since LLVM
> shufflevector instruction, as far as I see, only supports
> constant masks as well.
I opened a feature request on github. I also tried using the
gccbuiltins but I got this error:
LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB
0x2199c74e9a8, 0x2199c74d6c0
0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90,
Register:v16i8 %vreg384
0x2199c96fb00: v16i8 = Register %vreg384
0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90,
Register:v16i8 %vreg385
0x2199c74ed50: v16i8 = Register %vreg385
In function: _D7assetdb12loadBmpImageFAxaZf
Building x64\LDCDebug\DNgin.exe failed!
You can see the code I used here:
https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123
Note that if you want to try it you will need a test.bmp in
specific format where header.compression == 3, like this one:
https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharing
More information about the Digitalmars-d-learn
mailing list