[Issue 9963] New: Absurdly Inefficient Codegen For Adding Boolean Predicates
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Apr 19 12:25:34 PDT 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9963
Summary: Absurdly Inefficient Codegen For Adding Boolean
Predicates
Product: D
Version: D1 & D2
Platform: All
OS/Version: All
Status: NEW
Keywords: performance
Severity: normal
Priority: P2
Component: DMD
AssignedTo: nobody at puremagic.com
ReportedBy: dsimcha at yahoo.com
--- Comment #0 from David Simcha <dsimcha at yahoo.com> 2013-04-19 12:25:32 PDT ---
D source Code:
__gshared ulong n_less = 0, n_greater = 0;
void doConditional(ubyte thresh, ubyte[] arr) {
ulong l, g;
foreach(val; arr) {
l += (thresh < val);
g += !(thresh < val);
}
n_less += l;
n_greater += g;
}
DMD-generated ASM code (foreach loop only, from obj2asm, when compiled with -O
-inline -release):
L33: mov RDX,-018h[RBP]
mov CL,[RDX][R8]
cmp CL,R9B
mov EAX,1
ja L47
xor EAX,EAX
L47: cdqe
add R11,RAX
cmp R9B,CL
sbb EAX,EAX
inc EAX
cdqe
add RBX,RAX
inc R8
cmp R8,-010h[RBP]
jb L33
Why use sbb + neg + two cmp instructions instead of just using setb and setae?
This executes in about 0.495 seconds for an array of 100 million elements.
GCC's codegen for the same function:
L20: movzx ECX,[RAX][RDX]
xor R10D,R10D
cmp ECX,EDI
setnle R10B
add R9,R10
cmp ECX,EDI
setle CL
add RAX,1
movzx ECX,CL
add R8,RCX
cmp RAX,RSI
jne L20
This executes in about 0.095 seconds for an array of 100 million elements.
My hand-compilation for this loop:
LStart:
cmp DL, byte ptr [RAX];
setae R9B;
adc R10, 0;
inc RAX;
add R11, R9;
cmp RAX, RBX;
jb LStart;
This executes in about 0.071 seconds for an array of 100 million elements.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list