[Issue 9963] New: Absurdly Inefficient Codegen For Adding Boolean Predicates

Fri Apr 19 12:25:34 PDT 2013

http://d.puremagic.com/issues/show_bug.cgi?id=9963

           Summary: Absurdly Inefficient Codegen For Adding Boolean
                    Predicates
           Product: D
           Version: D1 & D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody at puremagic.com
        ReportedBy: dsimcha at yahoo.com

--- Comment #0 from David Simcha <dsimcha at yahoo.com> 2013-04-19 12:25:32 PDT ---
D source Code:

__gshared ulong n_less = 0, n_greater = 0;

void doConditional(ubyte thresh, ubyte[] arr) {
  ulong l, g;
  foreach(val; arr) {
    l += (thresh < val);
    g += !(thresh < val);
  }

  n_less += l;
  n_greater += g;
}

DMD-generated ASM code (foreach loop only, from obj2asm, when compiled with -O
-inline -release):

L33:        mov    RDX,-018h[RBP]
        mov    CL,[RDX][R8]
        cmp    CL,R9B
        mov    EAX,1
        ja    L47
        xor    EAX,EAX
L47:        cdqe
        add    R11,RAX
        cmp    R9B,CL
        sbb    EAX,EAX
        inc    EAX
        cdqe
        add    RBX,RAX
        inc    R8
        cmp    R8,-010h[RBP]
        jb    L33

Why use sbb + neg + two cmp instructions instead of just using setb and setae? 
This executes in about 0.495 seconds for an array of 100 million elements.

GCC's codegen for the same function:

L20:        movzx    ECX,[RAX][RDX]
        xor    R10D,R10D
        cmp    ECX,EDI
        setnle    R10B
        add    R9,R10
        cmp    ECX,EDI
        setle    CL
        add    RAX,1
        movzx    ECX,CL
        add    R8,RCX
        cmp    RAX,RSI
        jne    L20

This executes in about 0.095 seconds for an array of 100 million elements.

My hand-compilation for this loop:

   LStart:
    cmp DL, byte ptr [RAX];
    setae R9B;
    adc R10, 0;
    inc RAX;
    add R11, R9;
    cmp RAX, RBX;
    jb LStart;

This executes in about 0.071 seconds for an array of 100 million elements.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------