core.simd woes

Manu turkeyman at gmail.com
Tue Oct 2 03:17:48 PDT 2012


On 2 October 2012 05:28, F i L <witte2008 at gmail.com> wrote:

> Not to resurrect the dead, I just wanted to share an article I came across
> concerning SIMD with Manu..
>
> http://www.gamasutra.com/view/**feature/4248/designing_fast_**
> crossplatform_simd_.php<http://www.gamasutra.com/view/feature/4248/designing_fast_crossplatform_simd_.php>
>
> QUOTE:
>
> 1. Returning results by value
>
> By observing the intrisics interface a vector library must imitate that
> interface to maximize performance. Therefore, you must return the results
> by value and not by reference, as such:
>
>     //correct
>     inline Vec4 VAdd(Vec4 va, Vec4 vb)
>     {
>         return(_mm_add_ps(va, vb));
>     };
>
> On the other hand if the data is returned by reference the interface will
> generate code bloat. The incorrect version below:
>
>     //incorrect (code bloat!)
>     inline void VAddSlow(Vec4& vr, Vec4 va, Vec4 vb)
>     {
>         vr = _mm_add_ps(va, vb);
>     };
>
> The reason you must return data by value is because the quad-word
> (128-bit) fits nicely inside one SIMD register. And one of the key factors
> of a vector library is to keep the data inside these registers as much as
> possible. By doing that, you avoid unnecessary loads and stores operations
> from SIMD registers to memory or FPU registers. When combining multiple
> vector operations the "returned by value" interface allows the compiler to
> optimize these loads and stores easily by minimizing SIMD to FPU or memory
> transfers.
>
> 2. Data Declared "Purely"
>
> Here, "pure data" is defined as data declared outside a "class" or
> "struct" by a simple "typedef" or "define". When I was researching various
> vector libraries before coding VMath, I observed one common pattern among
> all libraries I looked at during that time. In all cases, developers
> wrapped the basic quad-word type inside a "class" or "struct" instead of
> declaring it purely, as follows:
>
>     class Vec4
>     {
>         ...
>     private:
>         __m128 xyzw;
>     };
>
> This type of data encapsulation is a common practice among C++ developers
> to make the architecture of the software robust. The data is protected and
> can be accessed only by the class interface functions. Nonetheless, this
> design causes code bloat by many different compilers in different
> platforms, especially if some sort of GCC port is being used.
>
> An approach that is much friendlier to the compiler is to declare the
> vector data "purely", as follows:
>
> typedef __m128 Vec4;
>
> ENDQUOTE;
>
>
>
>
> The article is 2 years old, but It appears my earlier performance issue
> wasn't D related at all, but an issue with C as well. I think in this
> situation, it might be best (most optimized) to handle simd "the C way" by
> creating and alias or union of a simd intrinsic. D has a big advantage over
> C/C++ here because of UFCS, in that we can write external functions that
> appear no different to encapsulated object methods. That combined with
> public-aliasing means the end-user only sees our pretty functions, but
> we're not sacrificing performance at all.
>

These are indeed common gotchas. But they don't necessarily apply to D, and
if they do, then they should be bugged and hopefully addressed. There is no
reason that D needs to follow these typical performance patterns from C.
It's worth noting that not all C compilers suffer from this problem. There
are many (most actually) compilers that can recognise a struct with a
single member and treat it as if it were an instance of that member
directly when being passed by value.
It only tends to be a problem on older games-console compilers.

As I said earlier. When I get back to finishing srd.simd off (I presume
this will be some time after Walter has finished Win64 support), I'll go
through and scrutinise the code-gen for the API very thoroughly. We'll see
what that reveals. But I don't think there's any reason we should suffer
the same legacy C by-value code-gen problems in D... (hopefully I won't eat
those words ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20121002/175e6d4e/attachment.html>


More information about the Digitalmars-d mailing list