value range propagation for _bitwise_ OR
Adam D. Ruppe
destructionator at gmail.com
Tue Apr 13 08:42:42 PDT 2010
On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
> That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly?
The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.
I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode instead of
user opcodes.
--
Adam D. Ruppe
http://arsdnet.net
More information about the Digitalmars-d
mailing list