core.bitop.bt not faster than & ?
Trollgeir via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Dec 17 08:08:40 PST 2014
On Wednesday, 17 December 2014 at 14:58:13 UTC, Adam D. Ruppe
wrote:
> On Wednesday, 17 December 2014 at 14:12:16 UTC, Trollgeir wrote:
>> I'd expect the bt function to be up to 32 times faster as I
>> thought it only compared two bits, and not the entire length
>> of bits in the uint.
>
> The processor doesn't work in terms of bits like that - it
> still needs to look at the whole integer. In fact, according to
> my (OLD) asm reference, the bt instruction is slower than the
> and instruction at the cpu level.
>
> I think it has to do a wee bit more work, translating the 16
> into a mask then moving the result into the flag... then moving
> the flag back into a register to return the value. (That last
> step could probably be skipped if you do an if() on it and the
> compiler optimizes the branch, and the first step might be
> skipped too if it is a constant, since the compiler can rewrite
> the instruction. So given that, I'd expect what you saw: no
> difference when they are optimized to the same thing or when
> the CPU's stars align right, and & a bit faster when bt isn't
> optimized)
>
> bt() and friends are special instructions for specialized use
> cases. Probably useful for threading and stuff.
Thanks for the explanation, I suspected it might work something
like that.
For my implementation - I have bits shifting to the right every
update, and I want to check if it has reached certain markers.
Hence, I felt it was really inefficient to check every single bit
in the uint when I was only interested in some specific ones. Is
an alternative (optimized) version of bt even possible?
More information about the Digitalmars-d-learn
mailing list