Very simple SIMD programming

Paulo Pinto pjmlp at progtools.org
Wed Oct 24 07:16:55 PDT 2012


On Wednesday, 24 October 2012 at 12:50:44 UTC, Manu wrote:
> On 24 October 2012 15:39, Paulo Pinto <pjmlp at progtools.org> 
> wrote:
>
>> On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile 
>> wrote:
>>
>>> I have found a nice paper, "Extending a C-like Language for 
>>> Portable SIMD
>>> Programming", (2012), by Roland L., Sebastian Hack and Ingo 
>>> Wald:
>>>
>>> http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
>>>
>>> SIMD programming is necessary in a system language, or in any 
>>> language
>>> that wants to use the modern CPUs well. So languages like C, 
>>> C++, D (and
>>> Mono-C#) support such wider registers.
>>>
>>> The authors of this paper have understood that it's also 
>>> important to
>>> make SIMD programming easy, almost as easy as scalar code, so 
>>> most
>>> programmers are able to write such kind of correct code.
>>>
>>> So this this paper presents ideas to better express SIMD 
>>> semantics in a
>>> C-like language. They introduce few new constructs in a large 
>>> subset of C
>>> language, with few ideas. The result coding patterns seem 
>>> easy enough (they
>>> are surely look simpler than most multi-core coding patterns 
>>> I've seen,
>>> including Cilk+).
>>>
>>>
>>> They present a simple scalar program in C:
>>>
>>> struct data_t {
>>>     int key;
>>>     int other;
>>> };
>>>
>>> int search(data_t* data , int N) {
>>>     for (int i = 0; i < N; i++) {
>>>         int x = data[i].key;
>>>         if (4 < x & x <= 8) return x;
>>>     }
>>>     return -1;
>>> }
>>>
>>>
>>> Then they explain the three most common ways to represent an 
>>> array of
>>> structs, here a struct that contains 3 values:
>>>
>>> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 
>>> z6 x7 y7 z7
>>> (a) Array of Structures (AoS)
>>>
>>> x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 
>>> z3 z4 z5 z6
>>> z7
>>> (b) Structure of Arrays (SoA)
>>>
>>> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 
>>> z4 z5 z6 z7
>>> (c) Hybrid Structure of Arrays (Hybrid SoA)
>>>
>>> They explain how the (c) is the preferred pattern in SIMD 
>>> programming.
>>>
>>>
>>> Using the (c) data pattern they show how in C with (nice) 
>>> SIMD intrinsics
>>> you write vectorized code (a simd_data_t struct instance 
>>> contains 8 int
>>> values):
>>>
>>> struct simd_data_t {
>>>     simd_int key;
>>>     simd_int other;
>>> };
>>>
>>> int search(simd_data_t* data , int N) {
>>>     for (int i = 0; i < N/L; ++i) {
>>>         simd_int x = load(data[i].key);
>>>         simd_int cmp = simd_and(simd_lt(4, x),
>>>         simd_le(x, 8));
>>>         int mask = simd_to_mask(cmp);
>>>         if (mask != 0) {
>>>             simd_int result = simd_and(mask , x);
>>>             for (int j = 0; j < log2(L); j++)
>>>                 result = simd_or(result ,
>>>                 whole_reg_shr(result , 1 << j));
>>>                 return simd_extract(result , 0);
>>>             }
>>>         }
>>>     return -1;
>>> }
>>>
>>>
>>> D should do become able to do this (that is not too much 
>>> bad), or better.
>>>
>>>
>>> Their C language extensions allow to write nicer code like:
>>>
>>> struct data_t {
>>>     int key;
>>>     int other;
>>> };
>>>
>>> int search(data_t *scalar data , int scalar N) {
>>>     int L = lengthof(*data);
>>>     for (int i = 0; i < N/L; ++i) {
>>>         int x = data[i].key;
>>>         if (4 < x & x <= 8)
>>>             int block[L] result = [x, 0];
>>>         scalar {
>>>             for (int j = 0; j < log2(L); ++j)
>>>                 result |= whole_reg_shr(result , 1 << j);
>>>             return get(x, 0);
>>>         }
>>>     }
>>>     return -1;
>>> }
>>>
>>>
>>> This is based on just few simple ideas, explained in the 
>>> paper (they are
>>> interesting, but quoting here those parts of the paper is not 
>>> a good idea).
>>> Such ideas are not directly portable to D (unless the 
>>> front-end is changed.
>>> Their compiler works by lowering, and emits regular C++ code 
>>> with
>>> intrinsics).
>>>
>>>
>>> Near the end of the paper they also propose some C++ library 
>>> code:
>>>
>>>  the C++ template mechanism would allow to define a hybrid 
>>> SoA container
>>>> class: Similar to std::vector which abstracts a traditional 
>>>> C array, one
>>>> could implement a wrapper around a T block[N]*:<
>>>>
>>>
>>>
>>> // scalar context throughout this example
>>> struct vec3 { float x, y, z; };
>>> // vec3 block[N]* pointing to ceil(n/N) elements
>>> hsoa <vec3 > vecs(n);
>>> // preferred vector length of vec3 automatically derived
>>> static const int N = hsoa <vec3 >::vector_length;
>>> int i = /*...*/
>>> hsoa <vec3 >::block_index ii = /*...*/
>>> vec3 v = vecs[i]; // gather
>>> vecs[i] = v; // scatter
>>> vec3 block[N] w = vecs[ii]; // fetch whole block
>>> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
>>> r = v; // pipe through proxy
>>> // for each element
>>> vecs.foreach([](vec3& scalar v) { /*...*/ });
>>>
>>>
>>> Regardless of the other ideas of their C-like language, a 
>>> similar struct
>>> should be added to Phobos once a bit higher level SIMD 
>>> support is in better
>>> shape in D. Supporting Hybrid-SoA and few operations on it 
>>> will be an
>>> important but probably quite short and simple addition to 
>>> Phobos
>>> collections (it's essentially an struct that acts like an 
>>> array, with few
>>> simple extra operations).
>>>
>>> I think no commonly used language allows both very simple and 
>>> quite
>>> efficient SIMD programming (Scala, CUDA, C, C++, C#, Java, 
>>> Go, and
>>> currently Rust too, are not able to support SIMD programming 
>>> well. I think
>>> currently Haskell too is not supporting it well, but Haskell 
>>> is very
>>> flexible, and it's compiled by a native compiler, so such 
>>> things are maybe
>>> possible to add). So supporting it well in D will be an 
>>> interesting selling
>>> point of D. (Supporting a very simple SIMD coding in D will 
>>> make D more
>>> widespread, but such kind of programming will probably keep 
>>> being a small
>>> niche).
>>>
>>> Bye,
>>> bearophile
>>>
>>
>>
>> Actually, I am yet to see any language that has SIMD as part 
>> of the
>> language standard and not as an extension where each vendor 
>> does its own
>> way.
>>
>
> HLSL, GLSL, Cg? :)

I was thinking about general purpose programming languages, not 
domain specific ones.

--
Paulo




More information about the Digitalmars-d mailing list