ldc/dcompute nvptx intrinsics

Johan j at j.nl
Tue Feb 23 18:04:52 UTC 2021


On Sunday, 21 February 2021 at 01:18:10 UTC, Bruce Carneal wrote:
> On Saturday, 20 February 2021 at 12:38:35 UTC, Johan Engelen 
> wrote:
>> On Friday, 19 February 2021 at 20:02:29 UTC, Bruce Carneal 
>> wrote:
>>>
>>> I'd love to bring up some additional functionality but I'm 
>>> new to LDC/LLVM so it's slow going at this point of the 
>>> learning curve.  Does anyone have additional LDC/dcompute 
>>> CUDA intrinsics working or have some pointers for bringing up 
>>> more CUDA/warp intrinsics generally?
>>
>> Hi Bruce,
>>   You can write LLVM IR inside D code using 
>> ldc.llvmasm.__irEx. Perhaps that works as a quick workaround 
>> for defining calls to intrinsics.
>> Very quickly, something like this:
>>
>> ```
>> alias someFunction = __irEx!("forward declaration of 
>> intrinsic", "call intrinsic(%0,%1,%2)", int, long, void*);
>> ```
>>
>> -Johan
>
> Here's what I get so far when testing the above out against 
> llvm.nvvm.barrier0.popc (something that works as an 
> LDC_intrinsic):
>
> "can only call functions from other @compute modules in 
> @compute code"
>
> I'll keep digging.

Hi Bruce,
   I played around a bit and have a full working example for you:

```
@compute(CompileFor.deviceOnly) module dcompute;

import ldc.dcompute;

// Copied from ldc.llvmasm, because indeed it needs to be defined 
in @compute code.
pragma(LDC_inline_ir)
     R __irEx(string prefix, string code, string suffix, R, 
P...)(P) @trusted nothrow @nogc;

alias someFunction = __irEx!("declare i32 
@llvm.nvvm.barrier0.popc(i32)", "%i = call i32 
@llvm.nvvm.barrier0.popc(i32 %0)\nret i32 %i", "", int, int);


void foo(GlobalPointer!float x_in) {
     auto i = someFunction(2);
}
```

Compile with `bin/ldc2 -c -mdcompute-targets=cuda-350 
-mdcompute-file-prefix=dcompute_kernel  dcompute.d -O3` gives PTX 
output:

```
❯ cat dcompute_kernel_cuda350_64.ptx
//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_35
.address_size 64

         // .globl       
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv

.visible .func 
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv(
         .param .b64 
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv_param_0
)
{
         .reg .b32       %r<3>;

         mov.u32         %r1, 2;
         {
         .reg .pred      %p1;
         setp.ne.u32     %p1, %r1, 0;
         bar.red.popc.u32        %r2, 0, %p1;
         }
         ret;

}
```

If this indeed will fit your usecase, you have a good argument 
for including `__irEx` into ldc.dcompute. Please file 
bugs/features on github!

cheers,
   Johan




More information about the digitalmars-d-ldc mailing list