How about implementing SPMD on SIMD for D?
Random D user
no at email.com
Sun Jul 8 19:07:57 UTC 2018
On Saturday, 7 July 2018 at 13:26:10 UTC, Guillaume Piolat wrote:
> On Friday, 6 July 2018 at 23:08:27 UTC, Random D user wrote:
>> Especially, since D doesn't even attempt any
>> auto-vectorization (poor results and difficult to implement)
>> and manual loops are quite tedious to write (even std.simd
>> failed to materialize), so SPMD would be nice alternative.
>
> I think you are mistaken, D code is autovectorized often when
> using LDC.
That is good to know.
I haven't looked that much into LDC (or clang). I mostly use dmd
for fast edit-compile cycle. Although, plan is to use LDC for
"release"/optimized build eventually.
Anyway, I would just want to code some non-trivial loops in SIMD,
but I wouldn't want to fiddle with intrinsics. Or write a higher
level wrapper for them.
In my experience, you can only get the real benefits out of SIMD
if you carefully handcraft your hot loops to fully use it.
Sprinkling some SIMD here and there with a SIMD vector type,
doesn't really seem to yield big benefits.
>
> Sometimes it's not and it's hard to know why.
Exactly.
In my experience compilers (msvc) often don't.
> A pragma we could have is the one in the Intel C++ Compiler
> that says "hey this loop is safe to autovectorize".
>
>> What do you think?
>
> I think that ispc is like OpenCL on the CPU, but can't work on
> the GPU, FPGA or other OpenCL implementation. OpenCL is so fast
> because caching is explicit (several levels of memory are
> exposed).
Yeah, it should be similar. The point is not run it on GPU, you
can do CUDA, OpenCL, compute shader etc. for that.
CPU code is much easier to debug, and sometimes you're already
doing things on the GPU, but your CPU side has more room for
computation. And you don't have to copy your data between the GPU
and CPU or deal with latency.
Of course, OpenCL runs on CPU too, but I think there's quite a
bit of code required to set it up and to use it.
I guess my point was that I would like to do CPU SIMD code easily
without intrinsics (or manually trying to trick the compiler to
vectorize the code). SPMD stuff seems to solve these issues. It
would also be a forward looking step for D.
Ideally, just write your loop normally, debug it and add an
annotation to get it to run fast on SIMD. Done.
More information about the Digitalmars-d
mailing list