[Issue 24411] New: [CODEGEN] bad shl codegen

Mon Feb 26 06:19:14 UTC 2024

https://issues.dlang.org/show_bug.cgi?id=24411

          Issue ID: 24411
           Summary: [CODEGEN] bad shl codegen
           Product: D
           Version: D2
          Hardware: x86
                OS: All
            Status: NEW
          Severity: major
          Priority: P1
         Component: dmd
          Assignee: nobody at puremagic.com
          Reporter: turkeyman at gmail.com

I just uncovered a very surprising bug.

I have a function like this:

bool validCode(ubyte code)
{
        enum validCodes = 0b1111100111001100111111110;
        return (1 << code) & validCodes;
}

void test()
{
    assert(validCode(76) == false); // FAILS!
}

Where `code` is an enum with some sparse values close to zero, and only
specified code values are valid. I detect invalid code values by comparing the
code value bit against a bitfield.

`validCodes` is 32bit, and I made the assumption that `1<<x` where x is greater
than 32 would result in 0, and so the function above would return false for
`code` values larger than 32.
It turns out this is NOT the case, and I get surprising results.

This code compiles to:
00007FF6D1FE781B  mov         eax,1  
00007FF6D1FE7820  movzx       ecx,byte ptr [code]  
00007FF6D1FE7824  shl         eax,cl  
00007FF6D1FE7826  test        eax,1F399FEh  

In this case, `code` is 76 (an invalid code), and it turns out that the x86
`shl` doesn't shift left by 75 (resulting in 0), what actually happens is that
shl takes the lower 5 bits from cl, and shifts by that number, which happens to
be 12, so the result is `1 << 12`, which coincides with a 1-bit, and this
function returns TRUE in this case!

Is the << operator in the language specified to take the lower 5 bits of the
operand?
I think this is a codegen bug... the language shouldn't assume that the user
has clamped the value into the range required by the x86 opcode.

--