Bits rotations
bearophile
bearophileHUGS at lycos.com
Wed Oct 17 19:36:58 PDT 2012
In cryptographic code, and generally in bit-twiddling code,
rotation of the bits in a word is a common operation. It's so
common that Intel has made it an asm instruction.
A demo program:
uint foo(in uint x) pure nothrow {
return (x << 11) | (x >> (32 - 11));
}
uint bar(uint x, in ubyte n) pure nothrow {
asm {
mov EAX, x;
mov CL, n;
rol EAX, CL;
mov x, EAX;
}
return x;
}
void main() {
import std.stdio;
uint x = 4290772992U;
writefln("%032b", x);
uint y = foo(x);
writefln("%032b", y);
uint z = bar(x, 11);
writefln("%032b", z);
}
Its output, dmd -O -release -inline:
11111111110000000000000000000000
00000000000000000000011111111110
00000000000000000000011111111110
Even with full optimizations DMD seems not able to detect the
rotation in foo(), and writing assembly as in bar() is not an
option, because the code is even longer and bar() can't be
inlined. This is the ASM generated (32 bit):
_D4test3fooFNaNbxkZk:
push EAX
mov ECX,[ESP]
shl EAX,0Bh
shr ECX,015h
or EAX,ECX
pop ECX
ret
_D4test3barFNaNbkxhZk:
push EBP
mov EBP,ESP
push EAX
mov EAX,8[EBP]
mov CL,-4[EBP]
rol EAX,CL
mov 8[EBP],EAX
mov EAX,8[EBP]
mov ESP,EBP
pop EBP
ret 4
GDC 4.6.3 does better, recognizing the rol (foo() can be inlined,
usually becoming 1 instruction in the middle of other code) (I
like the demangling here!):
pure nothrow uint example.foo(const(uint)):
movl %edi, %eax
roll $11, %eax
ret
pure nothrow uint example.bar(uint, const(ubyte)):
movl %edi, -4(%rsp)
movb %sil, -5(%rsp)
movl -4(%rsp), %eax
movb -5(%rsp), %cl
roll %cl, %eax
movl %eax, -4(%rsp)
movl -4(%rsp), %eax
ret
So I'd like to write:
uint spam(in uint x) pure nothrow @safe {
import core.bitop: rol;
return rol(x, 11);
}
This is better because:
- It's standard. If a CPU supports rol (or ror) the compiler uses
it. If it doesn't support it, the compiler uses shifts and an or.
- It's shorter and more readable. For me "rol(x, 11)" is rather
more easy to read and debug than code like "(x << 11) | (x >> (32
- 11))".
- spam() is inlinable, just like foo() and unlike bar().
- It doesn't rely on compiler optimizations, that sometimes are
not present. If the CPU supports the rol, the compiler doesn't
need pattern matching, it just spits out a rol. DMD currently has
such optimization, but apparently here it's not working, I don't
know why. For such basic operation, that is built in many CPUs
there is no need for compiler optimizations.
Bye,
bearophile
More information about the Digitalmars-d
mailing list