iasm, unexpectedly slower than DMD production
Basile B. via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Sep 12 04:17:21 PDT 2016
On Monday, 12 September 2016 at 00:46:16 UTC, Basile B. wrote:
> I have this function, written in iasm:
>
> °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°
> T foo(T)(T x, T c)
> if (is(T == float) || is(T == double))
> {
>
> version(none)
> {
> return x*x*x - x*x*c + x*c;
> }
> else asm
> {
> naked;
> movsd XMM3, XMM1;
> mulsd XMM0, XMM1;
> mulsd XMM1, XMM1;
> movsd XMM2, XMM1;
> mulsd XMM1, XMM3;
> addsd XMM1, XMM0;
> mulsd XMM0, XMM3;
> subsd XMM1, XMM0;
> movsd XMM0, XMM1;
> ret;
> }
> }
>
> [...]
>
>
> When I change the version(none) to version(all), the benchmark
> is **7X** faster (e.g 410 against 3000 for the iasm version).
>
> This difference doesn't look normal at all.
> Does anyone know why ? The usage of the stack to move xmm1 in
> xmm0 is particularly strange...
The function was probably implicitly enclosed by what's normally
generated for try/catch block. with
asm nothrow
{
}
I get slightly better perfs than the DMD production.
Something to add the specifications I guess, nothing states this
behavior.
More information about the Digitalmars-d-learn
mailing list