Counting bits in a ulong

Patrick Schluter Patrick.Schluter at bbox.fr
Thu Jun 4 18:11:34 UTC 2020


On Thursday, 4 June 2020 at 17:38:48 UTC, H. S. Teoh wrote:
> On Thu, Jun 04, 2020 at 06:40:25AM -0700, H. S. Teoh via 
> Digitalmars-d wrote:
>> On Thu, Jun 04, 2020 at 01:27:39AM +0000, Paul Backus via 
>> Digitalmars-d wrote: [...]
>> > Interesting stuff. Did you compare your implementations with 
>> > the popcnt function in core.bitop? There could be a 
>> > potential improvement here.
>> 
>> On Thu, Jun 04, 2020 at 07:42:07AM +0000, Patrick Schluter via 
>> Digitalmars-d wrote: [...]
>> > You should try to use the popcnt instruction of the cpu (in 
>> > asm or
>> > with intrinsics). It's at least 3 or 4 order of magnitude 
>> > (i.e.
>> > between 1000 and 10000 times) faster than any of these 
>> > tricks.
> [...]
>> As it turns out, my CPU (AMD Phenom II X6 1055T) doesn't 
>> support the
>> instruction. :-(
> [...]
>
> Actually, I was wrong, my CPU *does* have this instruction, but 
> I needed to pass the right -mcpu flag to LDC before it will 
> emit it.  After turning it on, I got a huge performance boost; 
> it completely moved the function that calls popcnt out of the 
> hotspot into the background! Now it's other functions that have 
> become the bottleneck.  Cool stuff!
>
Cool, I just wanted to comment as I have myself a Phenom II X6 
1090T and was wondering.
Btw, if you want to use the Kernighan function for values with a 
lot of set bits, count the inverted value:

That's what I had in my code (gcc C)

DEFINE_INLINE  int PopCountFewClr(uint64_t w)
{
#ifdef __POPCNT__
   return __builtin_popcountll(w);
#else
   return CHAR_BIT*sizeof w - PopCountFewSet(~w);
#endif
}



More information about the Digitalmars-d mailing list