Counting bits in a ulong

Dominikus Dittes Scherkl dominikus at scherkl.de
Thu Jun 4 15:25:05 UTC 2020


On Thursday, 4 June 2020 at 01:16:13 UTC, H. S. Teoh wrote:
> So recently I needed to implement a function to count the 
> number of 1
> bits in a ulong inside a (very) hot inner loop.  There's of 
> course the
> naïve method (`val` is the input ulong):
>
>         uint count;
>         foreach (i; 0 .. 64)
>         {
>             if (val & (1L << i))
>                 count++;
>         }
>         return count;
>
> Knowing that this code is inside a hotspot, I wrote a parallel 
> bit counter based on the idea from [1]:
>
>         // Parallel bit count.
>         // We count upper/lower halves separately, interspersed 
> to maximize
>         // hyperthreading. >:-)
>         uint v1 = cast(uint)(val >> 32);
>         uint v2 = cast(uint)val & 0xFFFF_FFFF;
>
>         v1 = v1 - ((v1 >> 1) & 0x5555_5555);
>         v2 = v2 - ((v2 >> 1) & 0x5555_5555);
>         v1 = ((v1 >> 2) & 0x3333_3333) + (v1 & 0x3333_3333);
>         v2 = ((v2 >> 2) & 0x3333_3333) + (v2 & 0x3333_3333);
>         v1 = ((v1 >> 4) & 0x0F0F_0F0F) + (v1 & 0x0F0F_0F0F);
>         v2 = ((v2 >> 4) & 0x0F0F_0F0F) + (v2 & 0x0F0F_0F0F);
>         v1 = ((v1 >> 8) & 0x00FF_00FF) + (v1 & 0x00FF_00FF);
>         v2 = ((v2 >> 8) & 0x00FF_00FF) + (v2 & 0x00FF_00FF);
>         v1 = ((v1 >> 16) & 0x0000_FFFF) + (v1 & 0x0000_FFFF);
>         v2 = ((v2 >> 16) & 0x0000_FFFF) + (v2 & 0x0000_FFFF);
wikipedia recommends this:
     x -= (x >> 1) & m1;
     x = (x & m2) + ((x >> 2) & m2);
     x = (x + (x >> 4)) & m4;
     x += x >>  8;  //put count of each 16 bits into their lowest 
8 bits
     x += x >> 16;  //put count of each 32 bits into their lowest 
8 bits
     x += x >> 32;  //put count of each 64 bits into their lowest 
8 bits
     return x & 0x7f;
so in the last 4 lines you do too much work.
better use the intrinsics: popcnt(val>>32)+popcnt(val & uint.max)



More information about the Digitalmars-d mailing list