ldc/dcompute nvptx intrinsics
Johan
j at j.nl
Tue Feb 23 18:04:52 UTC 2021
On Sunday, 21 February 2021 at 01:18:10 UTC, Bruce Carneal wrote:
> On Saturday, 20 February 2021 at 12:38:35 UTC, Johan Engelen
> wrote:
>> On Friday, 19 February 2021 at 20:02:29 UTC, Bruce Carneal
>> wrote:
>>>
>>> I'd love to bring up some additional functionality but I'm
>>> new to LDC/LLVM so it's slow going at this point of the
>>> learning curve. Does anyone have additional LDC/dcompute
>>> CUDA intrinsics working or have some pointers for bringing up
>>> more CUDA/warp intrinsics generally?
>>
>> Hi Bruce,
>> You can write LLVM IR inside D code using
>> ldc.llvmasm.__irEx. Perhaps that works as a quick workaround
>> for defining calls to intrinsics.
>> Very quickly, something like this:
>>
>> ```
>> alias someFunction = __irEx!("forward declaration of
>> intrinsic", "call intrinsic(%0,%1,%2)", int, long, void*);
>> ```
>>
>> -Johan
>
> Here's what I get so far when testing the above out against
> llvm.nvvm.barrier0.popc (something that works as an
> LDC_intrinsic):
>
> "can only call functions from other @compute modules in
> @compute code"
>
> I'll keep digging.
Hi Bruce,
I played around a bit and have a full working example for you:
```
@compute(CompileFor.deviceOnly) module dcompute;
import ldc.dcompute;
// Copied from ldc.llvmasm, because indeed it needs to be defined
in @compute code.
pragma(LDC_inline_ir)
R __irEx(string prefix, string code, string suffix, R,
P...)(P) @trusted nothrow @nogc;
alias someFunction = __irEx!("declare i32
@llvm.nvvm.barrier0.popc(i32)", "%i = call i32
@llvm.nvvm.barrier0.popc(i32 %0)\nret i32 %i", "", int, int);
void foo(GlobalPointer!float x_in) {
auto i = someFunction(2);
}
```
Compile with `bin/ldc2 -c -mdcompute-targets=cuda-350
-mdcompute-file-prefix=dcompute_kernel dcompute.d -O3` gives PTX
output:
```
❯ cat dcompute_kernel_cuda350_64.ptx
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_35
.address_size 64
// .globl
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv
.visible .func
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv(
.param .b64
_D8dcompute3fooFS3ldcQt__T7PointerVEQtQBk9AddrSpacei1TfZQBeZv_param_0
)
{
.reg .b32 %r<3>;
mov.u32 %r1, 2;
{
.reg .pred %p1;
setp.ne.u32 %p1, %r1, 0;
bar.red.popc.u32 %r2, 0, %p1;
}
ret;
}
```
If this indeed will fit your usecase, you have a good argument
for including `__irEx` into ldc.dcompute. Please file
bugs/features on github!
cheers,
Johan
More information about the digitalmars-d-ldc
mailing list