[Issue 15255] New: Generated better code for saturation arithmetic
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Wed Oct 28 20:26:38 PDT 2015
https://issues.dlang.org/show_bug.cgi?id=15255
Issue ID: 15255
Summary: Generated better code for saturation arithmetic
Product: D
Version: D2
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P1
Component: dmd
Assignee: nobody at puremagic.com
Reporter: bugzilla at digitalmars.com
I've just spent a couple
of hours trying to convince the compiler to generate the code I want
to see for saturating arithmetic. DMD generates a lot of extra work.
LDC does pretty well, but I still couldn't convince it to produce the
code I wanted to see. GDC is the same as LDC, except one of my
permutations seems to have hit the sweet spot! ;)
The asm I'm chasing is:
add %eax,%ebx
sbb %eax,%eax
or %ebx,%eax
I'm pretty sure this is the best result, but I can't convince DMD or
LDC to emit that code.
Code:
pure nothrow @safe @nogc:
template Promote(U)
{
static if(is(U == ubyte)) alias Promote = ushort;
else static if(is(U == ushort)) alias Promote = uint;
else static if(is(U == uint)) alias Promote = ulong;
else static if(is(U == ulong)) alias Promote = ucent;
else static assert(false, "!");
}
// note; I cast results everywhere because D's integer promotion
seemed to interfere
U addSat(U)(U a, U b) if(isIntegral!U && isUnsigned!U)
{
return cast(U)((a > cast(U)(U.max - b)) ? cast(U)U.max : cast(U)(a + b));
}
U addSat2(U)(U a, U b) if(isIntegral!U && isUnsigned!U)
{
Promote!U c = cast(Promote!U)a+b;
return cast(U)(-cast(U)(c >> (U.sizeof*8)) | cast(U)c);
// expect:
// add %eax,%ebx
// sbb %eax,%eax
// or %ebx,%eax
}
U addSat3(U)(U a, U b) if(isIntegral!U && isUnsigned!U)
{
U c = a + b;
if (c < a) // can only happen due to overflow
c = U(-1);
return c;
}
U addSat4(U)(U a, U b) if(isIntegral!U && isUnsigned!U)
{
U c = a + b;
c |= -U(c < a);
return c;
}
When I generate those functions for uint, I get these:
DMD's output:
pure nothrow @nogc @safe uint
std.experimental.color.internal.saturate.addSat!(uint).addSat(uint,
uint):
0000000000000000: push rbp
0000000000000001: mov rbp,rsp
0000000000000004: mov dword ptr [rbp+10h],ecx
0000000000000007: mov dword ptr [rbp+18h],edx
000000000000000A: mov eax,0FFFFFFFFh
000000000000000F: sub eax,dword ptr [rbp+10h]
0000000000000012: cmp eax,dword ptr [rbp+18h]
0000000000000015: mov eax,0FFFFFFFFh
000000000000001A: jb 0000000000000022
000000000000001C: mov ecx,dword ptr [rbp+10h]
000000000000001F: lea eax,[rdx+rcx]
0000000000000022: pop rbp
0000000000000023: ret
pure nothrow @nogc @safe uint
std.experimental.color.internal.saturate.addSat2!(uint).addSat2(uint,
uint):
0000000000000000: push rbp
0000000000000001: mov rbp,rsp
0000000000000004: sub rsp,10h
0000000000000008: mov dword ptr [rbp+10h],ecx
000000000000000B: mov dword ptr [rbp+18h],edx
000000000000000E: mov eax,dword ptr [rbp+18h]
0000000000000011: mov edx,dword ptr [rbp+10h]
0000000000000014: add rax,rdx
0000000000000017: mov qword ptr [rbp-8],rax
000000000000001B: mov eax,dword ptr [rbp-4]
000000000000001E: neg eax
0000000000000020: or eax,dword ptr [rbp-8]
0000000000000023: lea rsp,[rbp]
0000000000000027: pop rbp
0000000000000028: ret
pure nothrow @nogc @safe uint
std.experimental.color.internal.saturate.addSat3!(uint).addSat3(uint,
uint):
0000000000000000: push rbp
0000000000000001: mov rbp,rsp
0000000000000004: lea eax,[rdx+rcx]
0000000000000007: cmp eax,edx
0000000000000009: jae 0000000000000010
000000000000000B: mov eax,0FFFFFFFFh
0000000000000010: pop rbp
0000000000000011: ret
pure nothrow @nogc @safe uint
std.experimental.color.internal.saturate.addSat4!(uint).addSat4(uint,
uint):
0000000000000000: push rbp
0000000000000001: mov rbp,rsp
0000000000000004: lea r8d,[rdx+rcx]
0000000000000008: cmp r8d,edx
000000000000000B: setb al
000000000000000E: movzx eax,al
0000000000000011: neg eax
0000000000000013: or r8d,eax
0000000000000016: mov eax,r8d
0000000000000019: pop rbp
000000000000001A: ret
LDC's output:
addSat:
leal (%rdx,%rcx), %r8d
notl %ecx
cmpl %ecx, %edx
movl $-1, %eax
cmovbel %r8d, %eax
retq
addSat2:
movl %edx, %edx
movl %ecx, %eax
addq %rdx, %rax
movq %rax, %rcx
shrq $32, %rcx
negl %ecx
orl %ecx, %eax
retq
addSat3:
addl %edx, %ecx
movl $-1, %eax
cmovael %ecx, %eax
retq
addSat4:
addl %edx, %ecx
movl $-1, %eax
cmovael %ecx, %eax
retq
GDC is almost identical to LDC, except:
addSat4:
addl %edi, %esi
sbbl %eax, %eax
orl %esi, %eax
ret
I think this is what I was chasing! (not sure what the deal is with
the choice of registers though?)
Anyway, I'm surprised by almost all these results, I thought all
compilers would do better on such simple code. Thought you might be
interested.
Cheers.
- Manu
--
More information about the Digitalmars-d-bugs
mailing list