Counting bits in a ulong
Patrick Schluter
Patrick.Schluter at bbox.fr
Thu Jun 4 18:11:34 UTC 2020
On Thursday, 4 June 2020 at 17:38:48 UTC, H. S. Teoh wrote:
> On Thu, Jun 04, 2020 at 06:40:25AM -0700, H. S. Teoh via
> Digitalmars-d wrote:
>> On Thu, Jun 04, 2020 at 01:27:39AM +0000, Paul Backus via
>> Digitalmars-d wrote: [...]
>> > Interesting stuff. Did you compare your implementations with
>> > the popcnt function in core.bitop? There could be a
>> > potential improvement here.
>>
>> On Thu, Jun 04, 2020 at 07:42:07AM +0000, Patrick Schluter via
>> Digitalmars-d wrote: [...]
>> > You should try to use the popcnt instruction of the cpu (in
>> > asm or
>> > with intrinsics). It's at least 3 or 4 order of magnitude
>> > (i.e.
>> > between 1000 and 10000 times) faster than any of these
>> > tricks.
> [...]
>> As it turns out, my CPU (AMD Phenom II X6 1055T) doesn't
>> support the
>> instruction. :-(
> [...]
>
> Actually, I was wrong, my CPU *does* have this instruction, but
> I needed to pass the right -mcpu flag to LDC before it will
> emit it. After turning it on, I got a huge performance boost;
> it completely moved the function that calls popcnt out of the
> hotspot into the background! Now it's other functions that have
> become the bottleneck. Cool stuff!
>
Cool, I just wanted to comment as I have myself a Phenom II X6
1090T and was wondering.
Btw, if you want to use the Kernighan function for values with a
lot of set bits, count the inverted value:
That's what I had in my code (gcc C)
DEFINE_INLINE int PopCountFewClr(uint64_t w)
{
#ifdef __POPCNT__
return __builtin_popcountll(w);
#else
return CHAR_BIT*sizeof w - PopCountFewSet(~w);
#endif
}
More information about the Digitalmars-d
mailing list