Bits rotations

Thu Oct 18 01:00:17 PDT 2012

On 18 October 2012 03:36, bearophile <bearophileHUGS at lycos.com> wrote:
> In cryptographic code, and generally in bit-twiddling code, rotation of the
> bits in a word is a common operation. It's so common that Intel has made it
> an asm instruction.
>
> A demo program:
>
>
> uint foo(in uint x) pure nothrow {
>     return (x << 11) | (x >> (32 - 11));
> }
>
> uint bar(uint x, in ubyte n) pure nothrow {
>     asm {
>         mov EAX, x;
>         mov CL, n;
>         rol EAX, CL;
>         mov x, EAX;
>     }
>     return x;
> }
>
> void main() {
>     import std.stdio;
>     uint x = 4290772992U;
>     writefln("%032b", x);
>     uint y = foo(x);
>     writefln("%032b", y);
>     uint z = bar(x, 11);
>     writefln("%032b", z);
> }
>
>
> Its output, dmd -O -release -inline:
>
> 11111111110000000000000000000000
> 00000000000000000000011111111110
> 00000000000000000000011111111110
>
>
> Even with full optimizations DMD seems not able to detect the rotation in
> foo(), and writing assembly as in bar() is not an option, because the code
> is even longer and bar() can't be inlined. This is the ASM generated (32
> bit):
>
>
> _D4test3fooFNaNbxkZk:
>         push    EAX
>         mov ECX,[ESP]
>         shl EAX,0Bh
>         shr ECX,015h
>         or  EAX,ECX
>         pop ECX
>         ret
>
> _D4test3barFNaNbkxhZk:
>         push    EBP
>         mov EBP,ESP
>         push    EAX
>         mov EAX,8[EBP]
>         mov CL,-4[EBP]
>         rol EAX,CL
>         mov 8[EBP],EAX
>         mov EAX,8[EBP]
>         mov ESP,EBP
>         pop EBP
>         ret 4
>
>
> GDC 4.6.3 does better, recognizing the rol (foo() can be inlined, usually
> becoming 1 instruction in the middle of other code) (I like the demangling
> here!):
>
> pure nothrow uint example.foo(const(uint)):
>     movl    %edi, %eax
>     roll    $11, %eax
>     ret
>
> pure nothrow uint example.bar(uint, const(ubyte)):
>     movl  %edi, -4(%rsp)
>     movb  %sil, -5(%rsp)
>     movl -4(%rsp), %eax
>     movb -5(%rsp), %cl
>     roll %cl, %eax
>     movl %eax, -4(%rsp)
>     movl -4(%rsp), %eax
>     ret
>
>
> So I'd like to write:
>
> uint spam(in uint x) pure nothrow @safe {
>     import core.bitop: rol;
>     return rol(x, 11);
> }
>
>
> This is better because:
> - It's standard. If a CPU supports rol (or ror) the compiler uses it. If it
> doesn't support it, the compiler uses shifts and an or.
> - It's shorter and more readable. For me "rol(x, 11)" is rather more easy to
> read and debug than code like "(x << 11) | (x >> (32 - 11))".
> - spam() is inlinable, just like foo() and unlike bar().
> - It doesn't rely on compiler optimizations, that sometimes are not present.
> If the CPU supports the rol, the compiler doesn't need pattern matching, it
> just spits out a rol. DMD currently has such optimization, but apparently
> here it's not working, I don't know why. For such basic operation, that is
> built in many CPUs there is no need for compiler optimizations.
>
> Bye,
> bearophile

In the gdc-4.6 package you have there, it's only naked asm that can't
be inlined.   However it is worth noting that DIASM is no longer in
mainline gdc.

Thanks,
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';