Very simple SIMD programming

Wed Oct 24 05:50:33 PDT 2012

On 24 October 2012 15:39, Paulo Pinto <pjmlp at progtools.org> wrote:

> On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile wrote:
>
>> I have found a nice paper, "Extending a C-like Language for Portable SIMD
>> Programming", (2012), by Roland L., Sebastian Hack and Ingo Wald:
>>
>> http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
>>
>> SIMD programming is necessary in a system language, or in any language
>> that wants to use the modern CPUs well. So languages like C, C++, D (and
>> Mono-C#) support such wider registers.
>>
>> The authors of this paper have understood that it's also important to
>> make SIMD programming easy, almost as easy as scalar code, so most
>> programmers are able to write such kind of correct code.
>>
>> So this this paper presents ideas to better express SIMD semantics in a
>> C-like language. They introduce few new constructs in a large subset of C
>> language, with few ideas. The result coding patterns seem easy enough (they
>> are surely look simpler than most multi-core coding patterns I've seen,
>> including Cilk+).
>>
>>
>> They present a simple scalar program in C:
>>
>> struct data_t {
>>     int key;
>>     int other;
>> };
>>
>> int search(data_t* data , int N) {
>>     for (int i = 0; i < N; i++) {
>>         int x = data[i].key;
>>         if (4 < x & x <= 8) return x;
>>     }
>>     return -1;
>> }
>>
>>
>> Then they explain the three most common ways to represent an array of
>> structs, here a struct that contains 3 values:
>>
>> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 x7 y7 z7
>> (a) Array of Structures (AoS)
>>
>> x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 z3 z4 z5 z6
>> z7
>> (b) Structure of Arrays (SoA)
>>
>> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 z4 z5 z6 z7
>> (c) Hybrid Structure of Arrays (Hybrid SoA)
>>
>> They explain how the (c) is the preferred pattern in SIMD programming.
>>
>>
>> Using the (c) data pattern they show how in C with (nice) SIMD intrinsics
>> you write vectorized code (a simd_data_t struct instance contains 8 int
>> values):
>>
>> struct simd_data_t {
>>     simd_int key;
>>     simd_int other;
>> };
>>
>> int search(simd_data_t* data , int N) {
>>     for (int i = 0; i < N/L; ++i) {
>>         simd_int x = load(data[i].key);
>>         simd_int cmp = simd_and(simd_lt(4, x),
>>         simd_le(x, 8));
>>         int mask = simd_to_mask(cmp);
>>         if (mask != 0) {
>>             simd_int result = simd_and(mask , x);
>>             for (int j = 0; j < log2(L); j++)
>>                 result = simd_or(result ,
>>                 whole_reg_shr(result , 1 << j));
>>                 return simd_extract(result , 0);
>>             }
>>         }
>>     return -1;
>> }
>>
>>
>> D should do become able to do this (that is not too much bad), or better.
>>
>>
>> Their C language extensions allow to write nicer code like:
>>
>> struct data_t {
>>     int key;
>>     int other;
>> };
>>
>> int search(data_t *scalar data , int scalar N) {
>>     int L = lengthof(*data);
>>     for (int i = 0; i < N/L; ++i) {
>>         int x = data[i].key;
>>         if (4 < x & x <= 8)
>>             int block[L] result = [x, 0];
>>         scalar {
>>             for (int j = 0; j < log2(L); ++j)
>>                 result |= whole_reg_shr(result , 1 << j);
>>             return get(x, 0);
>>         }
>>     }
>>     return -1;
>> }
>>
>>
>> This is based on just few simple ideas, explained in the paper (they are
>> interesting, but quoting here those parts of the paper is not a good idea).
>> Such ideas are not directly portable to D (unless the front-end is changed.
>> Their compiler works by lowering, and emits regular C++ code with
>> intrinsics).
>>
>>
>> Near the end of the paper they also propose some C++ library code:
>>
>>  the C++ template mechanism would allow to define a hybrid SoA container
>>> class: Similar to std::vector which abstracts a traditional C array, one
>>> could implement a wrapper around a T block[N]*:<
>>>
>>
>>
>> // scalar context throughout this example
>> struct vec3 { float x, y, z; };
>> // vec3 block[N]* pointing to ceil(n/N) elements
>> hsoa <vec3 > vecs(n);
>> // preferred vector length of vec3 automatically derived
>> static const int N = hsoa <vec3 >::vector_length;
>> int i = /*...*/
>> hsoa <vec3 >::block_index ii = /*...*/
>> vec3 v = vecs[i]; // gather
>> vecs[i] = v; // scatter
>> vec3 block[N] w = vecs[ii]; // fetch whole block
>> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
>> r = v; // pipe through proxy
>> // for each element
>> vecs.foreach([](vec3& scalar v) { /*...*/ });
>>
>>
>> Regardless of the other ideas of their C-like language, a similar struct
>> should be added to Phobos once a bit higher level SIMD support is in better
>> shape in D. Supporting Hybrid-SoA and few operations on it will be an
>> important but probably quite short and simple addition to Phobos
>> collections (it's essentially an struct that acts like an array, with few
>> simple extra operations).
>>
>> I think no commonly used language allows both very simple and quite
>> efficient SIMD programming (Scala, CUDA, C, C++, C#, Java, Go, and
>> currently Rust too, are not able to support SIMD programming well. I think
>> currently Haskell too is not supporting it well, but Haskell is very
>> flexible, and it's compiled by a native compiler, so such things are maybe
>> possible to add). So supporting it well in D will be an interesting selling
>> point of D. (Supporting a very simple SIMD coding in D will make D more
>> widespread, but such kind of programming will probably keep being a small
>> niche).
>>
>> Bye,
>> bearophile
>>
>
>
> Actually, I am yet to see any language that has SIMD as part of the
> language standard and not as an extension where each vendor does its own
> way.
>

HLSL, GLSL, Cg? :)
I don't think it's possible considering that D is designed to plug on to
various backends.
D already has what's required to do some fairly nice (by comparison) simd
stuff with good supporting libraries.

One thing I can think of that would really improve simd (and not only simd)
would be a way to define compound operators.
If the library could detect/hook sequences of operations and implement them
more efficiently as a compound, that would make some very powerful
optimisations available.

Simple example:
  T opCompound(string seq)(T a, T b, T c) if(seq == "* +") { return
_madd(a, b, c); }
  T opCompound(string seq)(T a, T b, T c) if(seq == "+ *") { return
_madd(b, c, a); }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20121024/0e552b1c/attachment.html>