64-bit and SSE

Tue Mar 2 15:13:57 PST 2010

"Don" <nospam at nospam.com> wrote in message 
news:hmk01v$1u32$1 at digitalmars.com...
> retard wrote:
>> Tue, 02 Mar 2010 14:17:12 -0500, Nick Sabalausky wrote:
>>
>>> "retard" <re at tard.com.invalid> wrote in message
>>> news:hmjmjd$15uj$1 at digitalmars.com...
>>>> Tue, 02 Mar 2010 15:49:02 +0000, dsimcha wrote:
>>>>
>>>>> Given that Walter has indicated that 64-bit support is on the agenda
>>>>> for after D2 is finished and x87 is deprecated in 64-bit mode, will we
>>>>> also see SSE(2) support in DMD in the relatively near future?  If so,
>>>>> will it be exposed as a compiler option even when compiling in 32-bit
>>>>> mode?
>>>>>
>>>>> I've realized that this is kind of important for me since Intel
>>>>> deprecated x87 on its Core 2 and Pentium 4 chips, meaning any old
>>>>> school floating point code runs painfully slow compared to, say, an
>>>>> AMD chip that still has a decent x87.
>>>> SSE(2) ? Don't people already use SSE 4.2 and prepare for AVX?
>>> Yes. The ones who enjoy arbitrarily shrinking their potential user base.
>>
>> Why not dynamic code path selection:
>>
>> if (cpu_capabilities && SSE4_2)
>>   run_fast_method();
>> else if (cpu_capabilities && SSE2)
>>   run_medium_fast_method();
>> else
>>   run_slow_method();
>>
>> One could also use higher level design patterns like abstract factories 
>> here.
>
> The method needs to be fairly large for that to be beneficial. For 
> fine-grained stuff, like basic operations on 3D vectors, it doesn't work 
> at all. And that's one of the primary use cases for SSE.

You can still just increase the grain-size as needed. For instance, take 
this example of code that is too fine-grained:

-------------------------------------------
void fineGranedA(Param p)
{
    if(supports_SSE4)
        // Use SSE4
    else if(supports_SSE2)
        // Use SSE2
    else
        // Use Default
}

void fineGranedB(Param p)
{
    if(supports_SSE4)
        // Use SSE4
    else if(supports_SSE2)
        // Use SSE2
    else
        // Use Default
}

void foo()
{
    foreach(thing; bunchOThings)
    {
        fineGranedA(thing);
        fineGranedB(thing);
    }
}
-------------------------------------------

That can be turned into this (and a smart optimizer could probably do it 
automatically, especially if it's the compiler that's internally generating 
'fineGrainedA' and 'fineGrainedB' in the first place):

-------------------------------------------
enum CPUVer { SSE4, SSE2, Default }

void fineGranedA(CPUVer ver)(Param p)
{
    static if(ver == CPUVer.SSE4)
        // Use SSE4
    else static if(ver == CPUVer.SSE2)
        // Use SSE2
    else
        // Use Default
}

void fineGranedB(CPUVer ver)(Param p)
{
    static if(ver == CPUVer.SSE4)
        // Use SSE4
    else static if(ver == CPUVer.SSE2)
        // Use SSE2
    else
        // Use Default
}

void fooImpl(CPUVer ver)()
{
    foreach(thing; bunchOThings)
    {
        fineGranedA!(ver)(thing);
        fineGranedB!(ver)(thing);
    }
}

void foo()
{
    if(supports_SSE4)
        fooImpl!(CPUVer.SSE4)();
    else if(supports_SSE2)
        fooImpl!(CPUVer.SSE2)();
    else
        fooImpl!(CPUVer.Default)();
}
-------------------------------------------

And if foo gets called a lot, like in some loop, you can just take things 
another level out.