Counting bits in a ulong
Dominikus Dittes Scherkl
dominikus at scherkl.de
Thu Jun 4 15:25:05 UTC 2020
On Thursday, 4 June 2020 at 01:16:13 UTC, H. S. Teoh wrote:
> So recently I needed to implement a function to count the
> number of 1
> bits in a ulong inside a (very) hot inner loop. There's of
> course the
> naïve method (`val` is the input ulong):
>
> uint count;
> foreach (i; 0 .. 64)
> {
> if (val & (1L << i))
> count++;
> }
> return count;
>
> Knowing that this code is inside a hotspot, I wrote a parallel
> bit counter based on the idea from [1]:
>
> // Parallel bit count.
> // We count upper/lower halves separately, interspersed
> to maximize
> // hyperthreading. >:-)
> uint v1 = cast(uint)(val >> 32);
> uint v2 = cast(uint)val & 0xFFFF_FFFF;
>
> v1 = v1 - ((v1 >> 1) & 0x5555_5555);
> v2 = v2 - ((v2 >> 1) & 0x5555_5555);
> v1 = ((v1 >> 2) & 0x3333_3333) + (v1 & 0x3333_3333);
> v2 = ((v2 >> 2) & 0x3333_3333) + (v2 & 0x3333_3333);
> v1 = ((v1 >> 4) & 0x0F0F_0F0F) + (v1 & 0x0F0F_0F0F);
> v2 = ((v2 >> 4) & 0x0F0F_0F0F) + (v2 & 0x0F0F_0F0F);
> v1 = ((v1 >> 8) & 0x00FF_00FF) + (v1 & 0x00FF_00FF);
> v2 = ((v2 >> 8) & 0x00FF_00FF) + (v2 & 0x00FF_00FF);
> v1 = ((v1 >> 16) & 0x0000_FFFF) + (v1 & 0x0000_FFFF);
> v2 = ((v2 >> 16) & 0x0000_FFFF) + (v2 & 0x0000_FFFF);
wikipedia recommends this:
x -= (x >> 1) & m1;
x = (x & m2) + ((x >> 2) & m2);
x = (x + (x >> 4)) & m4;
x += x >> 8; //put count of each 16 bits into their lowest
8 bits
x += x >> 16; //put count of each 32 bits into their lowest
8 bits
x += x >> 32; //put count of each 64 bits into their lowest
8 bits
return x & 0x7f;
so in the last 4 lines you do too much work.
better use the intrinsics: popcnt(val>>32)+popcnt(val & uint.max)
More information about the Digitalmars-d
mailing list