core.bitop.bt not faster than & ?

Wed Dec 17 08:08:40 PST 2014

On Wednesday, 17 December 2014 at 14:58:13 UTC, Adam D. Ruppe 
wrote:
> On Wednesday, 17 December 2014 at 14:12:16 UTC, Trollgeir wrote:
>> I'd expect the bt function to be up to 32 times faster as I 
>> thought it only compared two bits, and not the entire length 
>> of bits in the uint.
>
> The processor doesn't work in terms of bits like that - it 
> still needs to look at the whole integer. In fact, according to 
> my (OLD) asm reference, the bt instruction is slower than the 
> and instruction at the cpu level.
>
> I think it has to do a wee bit more work, translating the 16 
> into a mask then moving the result into the flag... then moving 
> the flag back into a register to return the value. (That last 
> step could probably be skipped if you do an if() on it and the 
> compiler optimizes the branch, and the first step might be 
> skipped too if it is a constant, since the compiler can rewrite 
> the instruction. So given that, I'd expect what you saw: no 
> difference when they are optimized to the same thing or when 
> the CPU's stars align right, and & a bit faster when bt isn't 
> optimized)
>
> bt() and friends are special instructions for specialized use 
> cases. Probably useful for threading and stuff.

Thanks for the explanation, I suspected it might work something 
like that.

For my implementation - I have bits shifting to the right every 
update, and I want to check if it has reached certain markers. 
Hence, I felt it was really inefficient to check every single bit 
in the uint when I was only interested in some specific ones. Is 
an alternative (optimized) version of bt even possible?