SSE in D

Sun Oct 3 04:45:32 PDT 2010

>uses SSE registers too if your CPU (detected at runtime) supports them.
How is this done? - using codepaths after a call to cpuid?

and I can see the idea in cleaning up syntax, by replacing intrinsics with
array operators, however, what if I want to for instance shuffle? - would it
be possible to overload >> for that, or something? and how would it shuffle?
4 elements or the entire thing? - Say I want to shuffle elements once to the
right like this:
a b c d --> d a b c
(_mm_shuffle_ps(array, array, _MM_SHUFFLE(2, 1, 0, 3));)

Its just because I'm in need of such functionality to implement matrixes,
and such using SSE. - what would my alternative be? implementing
"xmmintrin.h" using bits of small inline asm? - that however wouldn't yield
any speed, if its not getting inlined?

On 3 October 2010 03:34, bearophile <bearophileHUGS at lycos.com> wrote:

> Emil Madsen:
>
> You are asking many different things, let's disentangle your questions a
> little.
>
> >Is there a D equivalent of the "xmmintrin.h", or any other convenient way
> of doing SSE in D?<
>
> D2 language is not designed to be an academic language, it's designed to be
> a reasonably practical language (despite some of its feature are not just
> buggy or unfinished, but also contain new design ideas, that far from being
> "battle tested", so no one knows if they will actually turn out to be good
> in large or very large D2 programs).
>
> But its implementation is not fully practical yet. In a compiler like GCC
> you may see a ton of dirty or smelly little features that turn out being
> practically useful or even almost necessary for real-world code, that are
> absent from the C standard. The D2 compiler lacks a big amount of such dirty
> utility corner cases. Even the (D1) compiler LDC shows some of such
> necessary dirty little features, like the allow_inline pragma to allow
> inlining of functions that contain asm, and so on. I guess that when D2 will
> be more finished, and some people will write a more efficient implementation
> of D2, those little smelly things will be added in abundance.
>
> The xmmintrin little dirty intrinsics are absent from DMD and D, both in
> practice and by design. GCC C is not designed much, they just add those SIMD
> operations to the ball of mud named GNU C (plus handy operator overloading
> if you want to sum or mult two registers represented as special arrays of
> doubles or floats or ints). D here is designed in a bit more idealistic way,
> and it tries to be semantically cleaner, so instead of those intrinsics, you
> are supposed to use vectorial operations done on arrays (both static and
> dynamic).
>
> Many of such operations are already implemented and more or less they work,
> but unless your arrays are large, they actually usually slow down your code,
> because they are chunks of pre-written asm (that use SSE+ registers too)
> designed for large arrays, are they are not inlined. In theory in future the
> D front-end will be able to replace a sum of two 4-float static arrays with
> a single SSE instruction (or little more) (if you have compiled the code for
> SSE-enabled CPUs). In practice DMD is far from this point, and the
> development efforts are (rightly!) focused on finishing core features and
> removing the worst implementation (or even design) bugs. Optimization of
> code generation matters are for later.
>
>
> > - I've been looking into the Array Operators, but will those work, for
> > instance if I'm doing something alike:
> > a[3], b[4]
> > c[4] = a+b;
>
> The right D syntax is:
>
> float[4] a, b, c;
> c[] = a[] + b[];
>
> You must always use [] after the array name. Arrays must have the same
> length.
>
> And currently you can't use this syntax:
>
> void main() {
>    float[4] a, b;
>    float[4] c[] = a[] + b[];
> }
>
>
> That gives the error:
>
> test.d(3): Error: cannot implicitly convert expression (a[] + b[]) of type
> float[] to float[4u][]
>
> Probably because of a unforeseen design bug that causes such collision
> between D and C syntax that is accepted still in D.
>
> See this bug report for more info about this design problem, that so far
> most people (including the main designers) seem to happily ignore:
> http://d.puremagic.com/issues/show_bug.cgi?id=3971
> Here I have suggested a possible solution, the introduction of a -cstyle
> compiler flag, that was ignored even more:
> http://d.puremagic.com/issues/show_bug.cgi?id=4580
>
> So this code works:
>
> void main() {
>    float[4] a, b, c;
>    c[] = a[] + b[];
> }
>
> But it performs a call to the asm routine that performs the vector c=a+b in
> assembly, that uses SSE registers too if your CPU (detected at runtime)
> supports them.
>
>
> > and when will the compiler write SSE asm for the array operators?
>
> DMD currently never writes SSE asm, unless you use those asm instructions
> in inlined asm code. The 64 bit DMD will probably be able to use those
> registers too, but I have no idea if then 32 bit DMD too will use them, I
> hope so, but I have little hope. I'd like to know this.
>
> D1 LDC now uses SSE registers for most of its floating point operations
> because LLVM is very bad in using the X86 floating point stack.
>
> Low-level D code written for D1 ldc is usually about as efficient as C code
> written for GCC. This is a very good thing. But recently the development of
> LDC has slowed down a lot, and there is no D2 version of it, it's not
> updated to the latest versions of LLVM and there's no Windows support
> because LLVM devs are paid by Apple and they don't care to make LLVM work
> fully (== with exceptions too) for Windows too, they just need to give to
> people the illusion that LLVM is multi-platform. I used to help LLVM
> development, but I have stopped until they will add a good support of
> exceptions on Windows.
>
> There is a GCC-based D compiler too, named GDC, and I think it works, but I
> have never appreciated it much on Windows. Other people may give you
> more/better info on it.
>
>
> > - is there a target=architecture for the compiler? or will it simply
> write
> > SSE if one defines something alike -msse4? -
>
> LDC D1 allows you to specify the target a little, while I think DMD always
> targets a Pentium1.
>
>
> > I'm having a bit of trouble finding stuff
> > about SSE for D, sources on the subject anyone?
>
> There is not much to search :-)
>
> Bye,
> bearophile
>

-- 
// Yours sincerely
// Emil 'Skeen' Madsen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20101003/3377b245/attachment.html>