ldc/dcompute nvptx intrinsics

Tue Feb 23 23:36:53 UTC 2021

On Tuesday, 23 February 2021 at 18:04:52 UTC, Johan wrote:
> On Sunday, 21 February 2021 at 01:18:10 UTC, Bruce Carneal 
> wrote:
>> On Saturday, 20 February 2021 at 12:38:35 UTC, Johan Engelen 
>> wrote:
>>> On Friday, 19 February 2021 at 20:02:29 UTC, Bruce Carneal 
>>> wrote:
>>>>
>>>> I'd love to bring up some additional functionality but I'm 
>>>> new to LDC/LLVM so it's slow going at this point of the 
>>>> learning curve.  Does anyone have additional LDC/dcompute 
>>>> CUDA intrinsics working or have some pointers for bringing 
>>>> up more CUDA/warp intrinsics generally?
>>>

>
> Hi Bruce,
>   I played around a bit and have a full working example for you:
>
> ```
> @compute(CompileFor.deviceOnly) module dcompute;
>
> import ldc.dcompute;
>
> [... working example ...]
>
> If this indeed will fit your usecase, you have a good argument 
> for including `__irEx` into ldc.dcompute. Please file 
> bugs/features on github!
>
> cheers,
>   Johan

Success!

After verifying that your latest example worked I plugged in 
llvm.nvvm.clz.i from the .td file.  That generated the hoped for 
single instruction function body:
   clz.b32  %r2, %r1
This clz intrinsic alone saves me a couple dozen instructions in 
a hot section of code where I "call" clz twice.

I will expand the set of intrinsics enabled via the __irEx method 
over the next few days and then try to contact Nicholas W. and/or 
John C. via email or beerconf to get their take on the 
capabilities (they may suggest a more easily supported way to go 
about things, or have cuda naming suggestions, or want to 
rationalize these with OCL, or ...).  Assuming that goes well 
I'll file with ldc and dcompute.

Thank you Johan.