64-bit and SSE
Nick Sabalausky
a at a.a
Tue Mar 2 15:13:57 PST 2010
"Don" <nospam at nospam.com> wrote in message
news:hmk01v$1u32$1 at digitalmars.com...
> retard wrote:
>> Tue, 02 Mar 2010 14:17:12 -0500, Nick Sabalausky wrote:
>>
>>> "retard" <re at tard.com.invalid> wrote in message
>>> news:hmjmjd$15uj$1 at digitalmars.com...
>>>> Tue, 02 Mar 2010 15:49:02 +0000, dsimcha wrote:
>>>>
>>>>> Given that Walter has indicated that 64-bit support is on the agenda
>>>>> for after D2 is finished and x87 is deprecated in 64-bit mode, will we
>>>>> also see SSE(2) support in DMD in the relatively near future? If so,
>>>>> will it be exposed as a compiler option even when compiling in 32-bit
>>>>> mode?
>>>>>
>>>>> I've realized that this is kind of important for me since Intel
>>>>> deprecated x87 on its Core 2 and Pentium 4 chips, meaning any old
>>>>> school floating point code runs painfully slow compared to, say, an
>>>>> AMD chip that still has a decent x87.
>>>> SSE(2) ? Don't people already use SSE 4.2 and prepare for AVX?
>>> Yes. The ones who enjoy arbitrarily shrinking their potential user base.
>>
>> Why not dynamic code path selection:
>>
>> if (cpu_capabilities && SSE4_2)
>> run_fast_method();
>> else if (cpu_capabilities && SSE2)
>> run_medium_fast_method();
>> else
>> run_slow_method();
>>
>> One could also use higher level design patterns like abstract factories
>> here.
>
> The method needs to be fairly large for that to be beneficial. For
> fine-grained stuff, like basic operations on 3D vectors, it doesn't work
> at all. And that's one of the primary use cases for SSE.
You can still just increase the grain-size as needed. For instance, take
this example of code that is too fine-grained:
-------------------------------------------
void fineGranedA(Param p)
{
if(supports_SSE4)
// Use SSE4
else if(supports_SSE2)
// Use SSE2
else
// Use Default
}
void fineGranedB(Param p)
{
if(supports_SSE4)
// Use SSE4
else if(supports_SSE2)
// Use SSE2
else
// Use Default
}
void foo()
{
foreach(thing; bunchOThings)
{
fineGranedA(thing);
fineGranedB(thing);
}
}
-------------------------------------------
That can be turned into this (and a smart optimizer could probably do it
automatically, especially if it's the compiler that's internally generating
'fineGrainedA' and 'fineGrainedB' in the first place):
-------------------------------------------
enum CPUVer { SSE4, SSE2, Default }
void fineGranedA(CPUVer ver)(Param p)
{
static if(ver == CPUVer.SSE4)
// Use SSE4
else static if(ver == CPUVer.SSE2)
// Use SSE2
else
// Use Default
}
void fineGranedB(CPUVer ver)(Param p)
{
static if(ver == CPUVer.SSE4)
// Use SSE4
else static if(ver == CPUVer.SSE2)
// Use SSE2
else
// Use Default
}
void fooImpl(CPUVer ver)()
{
foreach(thing; bunchOThings)
{
fineGranedA!(ver)(thing);
fineGranedB!(ver)(thing);
}
}
void foo()
{
if(supports_SSE4)
fooImpl!(CPUVer.SSE4)();
else if(supports_SSE2)
fooImpl!(CPUVer.SSE2)();
else
fooImpl!(CPUVer.Default)();
}
-------------------------------------------
And if foo gets called a lot, like in some loop, you can just take things
another level out.
More information about the Digitalmars-d
mailing list