core.bitop.bt not faster than & ?

Wed Dec 17 06:58:12 PST 2014

On Wednesday, 17 December 2014 at 14:12:16 UTC, Trollgeir wrote:
> I'd expect the bt function to be up to 32 times faster as I 
> thought it only compared two bits, and not the entire length of 
> bits in the uint.

The processor doesn't work in terms of bits like that - it still 
needs to look at the whole integer. In fact, according to my 
(OLD) asm reference, the bt instruction is slower than the and 
instruction at the cpu level.

I think it has to do a wee bit more work, translating the 16 into 
a mask then moving the result into the flag... then moving the 
flag back into a register to return the value. (That last step 
could probably be skipped if you do an if() on it and the 
compiler optimizes the branch, and the first step might be 
skipped too if it is a constant, since the compiler can rewrite 
the instruction. So given that, I'd expect what you saw: no 
difference when they are optimized to the same thing or when the 
CPU's stars align right, and & a bit faster when bt isn't 
optimized)

bt() and friends are special instructions for specialized use 
cases. Probably useful for threading and stuff.