Very simple SIMD programming
Paulo Pinto
pjmlp at progtools.org
Wed Oct 24 07:16:55 PDT 2012
On Wednesday, 24 October 2012 at 12:50:44 UTC, Manu wrote:
> On 24 October 2012 15:39, Paulo Pinto <pjmlp at progtools.org>
> wrote:
>
>> On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile
>> wrote:
>>
>>> I have found a nice paper, "Extending a C-like Language for
>>> Portable SIMD
>>> Programming", (2012), by Roland L., Sebastian Hack and Ingo
>>> Wald:
>>>
>>> http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
>>>
>>> SIMD programming is necessary in a system language, or in any
>>> language
>>> that wants to use the modern CPUs well. So languages like C,
>>> C++, D (and
>>> Mono-C#) support such wider registers.
>>>
>>> The authors of this paper have understood that it's also
>>> important to
>>> make SIMD programming easy, almost as easy as scalar code, so
>>> most
>>> programmers are able to write such kind of correct code.
>>>
>>> So this this paper presents ideas to better express SIMD
>>> semantics in a
>>> C-like language. They introduce few new constructs in a large
>>> subset of C
>>> language, with few ideas. The result coding patterns seem
>>> easy enough (they
>>> are surely look simpler than most multi-core coding patterns
>>> I've seen,
>>> including Cilk+).
>>>
>>>
>>> They present a simple scalar program in C:
>>>
>>> struct data_t {
>>> int key;
>>> int other;
>>> };
>>>
>>> int search(data_t* data , int N) {
>>> for (int i = 0; i < N; i++) {
>>> int x = data[i].key;
>>> if (4 < x & x <= 8) return x;
>>> }
>>> return -1;
>>> }
>>>
>>>
>>> Then they explain the three most common ways to represent an
>>> array of
>>> structs, here a struct that contains 3 values:
>>>
>>> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6
>>> z6 x7 y7 z7
>>> (a) Array of Structures (AoS)
>>>
>>> x0 x1 x2 x3 x4 x5 x6 x7 y0 y1 y2 y3 y4 y5 y6 y7 z0 z1 z2
>>> z3 z4 z5 z6
>>> z7
>>> (b) Structure of Arrays (SoA)
>>>
>>> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7
>>> z4 z5 z6 z7
>>> (c) Hybrid Structure of Arrays (Hybrid SoA)
>>>
>>> They explain how the (c) is the preferred pattern in SIMD
>>> programming.
>>>
>>>
>>> Using the (c) data pattern they show how in C with (nice)
>>> SIMD intrinsics
>>> you write vectorized code (a simd_data_t struct instance
>>> contains 8 int
>>> values):
>>>
>>> struct simd_data_t {
>>> simd_int key;
>>> simd_int other;
>>> };
>>>
>>> int search(simd_data_t* data , int N) {
>>> for (int i = 0; i < N/L; ++i) {
>>> simd_int x = load(data[i].key);
>>> simd_int cmp = simd_and(simd_lt(4, x),
>>> simd_le(x, 8));
>>> int mask = simd_to_mask(cmp);
>>> if (mask != 0) {
>>> simd_int result = simd_and(mask , x);
>>> for (int j = 0; j < log2(L); j++)
>>> result = simd_or(result ,
>>> whole_reg_shr(result , 1 << j));
>>> return simd_extract(result , 0);
>>> }
>>> }
>>> return -1;
>>> }
>>>
>>>
>>> D should do become able to do this (that is not too much
>>> bad), or better.
>>>
>>>
>>> Their C language extensions allow to write nicer code like:
>>>
>>> struct data_t {
>>> int key;
>>> int other;
>>> };
>>>
>>> int search(data_t *scalar data , int scalar N) {
>>> int L = lengthof(*data);
>>> for (int i = 0; i < N/L; ++i) {
>>> int x = data[i].key;
>>> if (4 < x & x <= 8)
>>> int block[L] result = [x, 0];
>>> scalar {
>>> for (int j = 0; j < log2(L); ++j)
>>> result |= whole_reg_shr(result , 1 << j);
>>> return get(x, 0);
>>> }
>>> }
>>> return -1;
>>> }
>>>
>>>
>>> This is based on just few simple ideas, explained in the
>>> paper (they are
>>> interesting, but quoting here those parts of the paper is not
>>> a good idea).
>>> Such ideas are not directly portable to D (unless the
>>> front-end is changed.
>>> Their compiler works by lowering, and emits regular C++ code
>>> with
>>> intrinsics).
>>>
>>>
>>> Near the end of the paper they also propose some C++ library
>>> code:
>>>
>>> the C++ template mechanism would allow to define a hybrid
>>> SoA container
>>>> class: Similar to std::vector which abstracts a traditional
>>>> C array, one
>>>> could implement a wrapper around a T block[N]*:<
>>>>
>>>
>>>
>>> // scalar context throughout this example
>>> struct vec3 { float x, y, z; };
>>> // vec3 block[N]* pointing to ceil(n/N) elements
>>> hsoa <vec3 > vecs(n);
>>> // preferred vector length of vec3 automatically derived
>>> static const int N = hsoa <vec3 >::vector_length;
>>> int i = /*...*/
>>> hsoa <vec3 >::block_index ii = /*...*/
>>> vec3 v = vecs[i]; // gather
>>> vecs[i] = v; // scatter
>>> vec3 block[N] w = vecs[ii]; // fetch whole block
>>> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
>>> r = v; // pipe through proxy
>>> // for each element
>>> vecs.foreach([](vec3& scalar v) { /*...*/ });
>>>
>>>
>>> Regardless of the other ideas of their C-like language, a
>>> similar struct
>>> should be added to Phobos once a bit higher level SIMD
>>> support is in better
>>> shape in D. Supporting Hybrid-SoA and few operations on it
>>> will be an
>>> important but probably quite short and simple addition to
>>> Phobos
>>> collections (it's essentially an struct that acts like an
>>> array, with few
>>> simple extra operations).
>>>
>>> I think no commonly used language allows both very simple and
>>> quite
>>> efficient SIMD programming (Scala, CUDA, C, C++, C#, Java,
>>> Go, and
>>> currently Rust too, are not able to support SIMD programming
>>> well. I think
>>> currently Haskell too is not supporting it well, but Haskell
>>> is very
>>> flexible, and it's compiled by a native compiler, so such
>>> things are maybe
>>> possible to add). So supporting it well in D will be an
>>> interesting selling
>>> point of D. (Supporting a very simple SIMD coding in D will
>>> make D more
>>> widespread, but such kind of programming will probably keep
>>> being a small
>>> niche).
>>>
>>> Bye,
>>> bearophile
>>>
>>
>>
>> Actually, I am yet to see any language that has SIMD as part
>> of the
>> language standard and not as an extension where each vendor
>> does its own
>> way.
>>
>
> HLSL, GLSL, Cg? :)
I was thinking about general purpose programming languages, not
domain specific ones.
--
Paulo
More information about the Digitalmars-d
mailing list