SIMD support...

Marco Leise Marco.Leise at gmx.de
Mon Jan 16 05:06:59 PST 2012


Am 15.01.2012, 11:45 Uhr, schrieb Manu <turkeyman at gmail.com>:

> On 15 January 2012 08:16, Sean Cavanaugh <WorksOnMyMachine at gmail.com>  
> wrote:
>
>> On 1/15/2012 12:09 AM, Walter Bright wrote:
>>
>>> On 1/14/2012 9:58 PM, Sean Cavanaugh wrote:
>>>
>>>> MS has three types, __m128, __m128i and __m128d (float, int, double)
>>>>
>>>> Six if you count AVX's 256 forms.
>>>>
>>>> On 1/7/2012 6:54 PM, Peter Alexander wrote:
>>>>
>>>>> On 7/01/12 9:28 PM, Andrei Alexandrescu wrote:
>>>>> I agree with Manu that we should just have a single type like __m128  
>>>>> in
>>>>> MSVC. The other types and their conversions should be solvable in a
>>>>> library with something like strong typedefs.
>>>>>
>>>>>
>>> The trouble with MS's scheme, is given the following:
>>>
>>> __m128i v;
>>> v += 2;
>>>
>>> Can't tell what to do. With D,
>>>
>>> int4 v;
>>> v += 2;
>>>
>>> it's clear (add 2 to each of the 4 ints).
>>>
>>
>> Working with their intrinsics in their raw form for real code is pure
>> insanity :)  You need to wrap it all with a good math library (even if  
>> 90%
>> of the library is the intrinsics wrapped into __forceinlined  
>> functions), so
>> you can start having sensible operator overloads, and so you can write  
>> code
>> that is readable.
>>
>>
>> if (any4(a > b))
>> {
>>  // do stuff
>> }
>>
>>
>> is way way way better than (pseudocode)
>>
>> if (__movemask_ps(_mm_gt_ps(a, b)) == 0x0F)
>> {
>> }
>>
>>
>>
>> and (if the ternary operator was overrideable in C++)
>>
>> float4 foo = (a > b) ? c : d;
>>
>> would be better than
>>
>> float4 mask = _mm_gt_ps(a, b);
>> float4 foo = _mm_or_ps(_mm_and_ps(mask, c), _mm_nand_ps_(mask, d));
>>
>
> Yep, it's coming... baby steps :)
>
> Walter: I told you games devs would be all over this! :P

And even a compression algorithms. I found one written in C, that uses  
external .asm files to be compiled into object files with NASM for use on  
the linker command line. They contain some MMX/SSE code depending on the  
processor you plan to use. The author claims, that the MMX version of the  
'outsourced' routines run 8x faster. I didn't verify this, but the idea  
that these instructions become part of the language and easy to use for  
regular programmers like me (and not just console game developers) is  
exciting. I bet there are more programs that could benefit from SSE than  
is obvious or code that could be rewritten in way, that multiple data sets  
can be processed simultaneous.


More information about the Digitalmars-d mailing list