Any usable SIMD implementation?
Joe Duarte via Digitalmars-d
digitalmars-d at puremagic.com
Sun Apr 17 17:27:06 PDT 2016
On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote:
> Besides, I think it's a poor design to customize the app for
> only one SIMD type. A better idea (I've repeated this ad
> nauseum over the years) is to have n modules, one for each
> supported SIMD type. Compile and link all of them in, then
> detect the SIMD type at runtime and call the corresponding
> module. (This is how the D array ops are currently implemented.)
There are many organizations in the world that are building
software in-house, where such software is targeted to modern CPU
SIMD types, most typically AVX/AVX2 and crypto instructions.
In these settings -- many of them scientific compute or big data
center operators -- they know what servers they have, what CPU
platforms they have. They don't care about portability to the
past, older computers and so forth. A runtime check would make no
sense for them, not for their baseline, and it would probably be
a waste of time for them to design code to run on pre-AVX
silicon. (AVX is not new anymore -- it's been around for a few
years.)
Good examples can be found on Cloudflare's blog, especially Vlad
Krasnov's posts. Here's one where he accelerates Golang's crypto
libraries:
https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/
Companies like CF probably spend millions of dollars on
electricity, and there are some workloads where AVX-optimized
code can yield tangible monetary savings.
Someone else said talked about marking "Broadwell" and other
generation names. As others have said, it's better to specify
features. I wanted to chime in with a couple of additional
examples. Intel's transactional memory accelerating instructions
(TSX) are only available on some Broadwell parts because there
was a bug in the original implementation (Haswell and early
Broadwell) and it's disabled on most. But the new Broadwell
server chips have it, and it's a big deal for some DB workloads.
Similarly, only some Skylake chips have the Secure Guard
instructions (SGX), which are very powerful for creating secure
enclaves on an untrusted host.
On the broader SIMD-as-first-class-citizen issue, I think it
would be worth thinking about how to bake SIMD into the language
instead of bolting it on. If I were designing a new language in
2016, I would take a fresh look at how SIMD could be baked into a
language's core constructs. I'd think about new loop abstractions
that could make SIMD easier to exploit, and how to nudge
programmers away from serial monotonic mindsets and into more of
a SIMD/FMA way of reasoning.
More information about the Digitalmars-d
mailing list