intel-intrinsics v1.0.0

Wed Feb 6 01:05:29 UTC 2019

"intel-intrinsics" is a DUB package for people interested in x86 
performance that want neither to write assembly, nor a 
LDC-specific snippet... and still have fastest possible code.

Available through DUB: 
http://code.dlang.org/packages/intel-intrinsics

*** Features of v1.1.0:

- All intrinsics in this list: 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX,SSE,SSE2 Use existing Intel documentation and syntax

- write the same code for both DMD and LDC, in the last 6 
versions for each. (Note that debug performance might suffer a 
lot when no inlining is activated.)

- Use operators on SIMD vectors as if core.simd were implemented 
on DMD 32-bit

- Introduces int2 and float2 because short SIMD vectors are useful

- about 6000 LOC (for now! more to come)

- Bonus: approximated pow/exp/log. Perform 4 approximated pow at 
once.

<future>
The long-term goal for this library is to be _only about 
semantics_, and not particularly codegen(!). This is because LLVM 
IR is portable, so forcing a particular instruction is undoing 
this portability work. **This can seem odd** for an "intrinsics" 
library but this way exact codegen options can be choosen by the 
library user, and most intrinsics can gracefuly degrade to 
portable IR in theory.

In the future, "magic" LLVM intrinsics will only be used when 
built for x86, but I think all of it can become portable and not 
x86-specific. Besides, there is a trend in LLVM to remove magic 
intrinsics once they are doable with IR only.
</future>

tl;dr you can use "intel-intrinsics" today, and get quite-optimal 
code with LDC, without duplication. You may come across early 
bugs too.
http://code.dlang.org/packages/intel-intrinsics

(note: it's important to bench against vanilla D code or arrays 
ops too, in some case the vanilla code wins)