value range propagation for _bitwise_ OR

Adam D. Ruppe destructionator at gmail.com
Tue Apr 13 08:42:42 PDT 2010


On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
> That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I guess is the reason it's an intrinsic in the first place). I would have expected this to be much, much faster than a user function. Anyone care enough to check the generated assembly?

The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode instead of
user opcodes.

-- 
Adam D. Ruppe
http://arsdnet.net



More information about the Digitalmars-d mailing list