Bits rotations
Iain Buclaw
ibuclaw at ubuntu.com
Thu Oct 18 01:00:17 PDT 2012
On 18 October 2012 03:36, bearophile <bearophileHUGS at lycos.com> wrote:
> In cryptographic code, and generally in bit-twiddling code, rotation of the
> bits in a word is a common operation. It's so common that Intel has made it
> an asm instruction.
>
> A demo program:
>
>
> uint foo(in uint x) pure nothrow {
> return (x << 11) | (x >> (32 - 11));
> }
>
> uint bar(uint x, in ubyte n) pure nothrow {
> asm {
> mov EAX, x;
> mov CL, n;
> rol EAX, CL;
> mov x, EAX;
> }
> return x;
> }
>
> void main() {
> import std.stdio;
> uint x = 4290772992U;
> writefln("%032b", x);
> uint y = foo(x);
> writefln("%032b", y);
> uint z = bar(x, 11);
> writefln("%032b", z);
> }
>
>
> Its output, dmd -O -release -inline:
>
> 11111111110000000000000000000000
> 00000000000000000000011111111110
> 00000000000000000000011111111110
>
>
> Even with full optimizations DMD seems not able to detect the rotation in
> foo(), and writing assembly as in bar() is not an option, because the code
> is even longer and bar() can't be inlined. This is the ASM generated (32
> bit):
>
>
> _D4test3fooFNaNbxkZk:
> push EAX
> mov ECX,[ESP]
> shl EAX,0Bh
> shr ECX,015h
> or EAX,ECX
> pop ECX
> ret
>
> _D4test3barFNaNbkxhZk:
> push EBP
> mov EBP,ESP
> push EAX
> mov EAX,8[EBP]
> mov CL,-4[EBP]
> rol EAX,CL
> mov 8[EBP],EAX
> mov EAX,8[EBP]
> mov ESP,EBP
> pop EBP
> ret 4
>
>
> GDC 4.6.3 does better, recognizing the rol (foo() can be inlined, usually
> becoming 1 instruction in the middle of other code) (I like the demangling
> here!):
>
> pure nothrow uint example.foo(const(uint)):
> movl %edi, %eax
> roll $11, %eax
> ret
>
> pure nothrow uint example.bar(uint, const(ubyte)):
> movl %edi, -4(%rsp)
> movb %sil, -5(%rsp)
> movl -4(%rsp), %eax
> movb -5(%rsp), %cl
> roll %cl, %eax
> movl %eax, -4(%rsp)
> movl -4(%rsp), %eax
> ret
>
>
> So I'd like to write:
>
> uint spam(in uint x) pure nothrow @safe {
> import core.bitop: rol;
> return rol(x, 11);
> }
>
>
> This is better because:
> - It's standard. If a CPU supports rol (or ror) the compiler uses it. If it
> doesn't support it, the compiler uses shifts and an or.
> - It's shorter and more readable. For me "rol(x, 11)" is rather more easy to
> read and debug than code like "(x << 11) | (x >> (32 - 11))".
> - spam() is inlinable, just like foo() and unlike bar().
> - It doesn't rely on compiler optimizations, that sometimes are not present.
> If the CPU supports the rol, the compiler doesn't need pattern matching, it
> just spits out a rol. DMD currently has such optimization, but apparently
> here it's not working, I don't know why. For such basic operation, that is
> built in many CPUs there is no need for compiler optimizations.
>
> Bye,
> bearophile
In the gdc-4.6 package you have there, it's only naked asm that can't
be inlined. However it is worth noting that DIASM is no longer in
mainline gdc.
Thanks,
--
Iain Buclaw
*(p < e ? p++ : p) = (c & 0x0f) + '0';
More information about the Digitalmars-d
mailing list