int opEquals(Object), and other legacy ints (!)

Mon Aug 7 10:21:16 PDT 2006

kris wrote:
> Lionello Lunesu wrote:
>>> But indeed, a cmp/sete combo seems to do the same in less instructions.
>>
>>
>> But is it faster? I've noticed that many of the higher-level assembly
>> instructions are actually slower than multiple lower-level ones. 
>> "loop" is
>> the best example of this (dec ecx/jne is faster), or "rep" (again, 
>> dec/jne
>> is faster).
>>
>> L.
>>
> 
> If you'd looked at the setne instruction linked previously, you'd have 
> seen that it consumes 3 cycles. And no; there are no jump, loops, or any 
> other reason to cause pipeline bubbles. If you need a primer on what 
> causes modern CPUs to stall (the silly P4 in particular) then you could 
> do a lot worse than to read the articles by Jon Stokes at ArsTechnica.
> 
> Oh, and this is just daft. Why don't we all count the cycles for a 
> call/return instead? Or, perhaps just exactly what it costs to compare 
> the bytes of two strings until they start to look different? You'll find 
> the cost of setne (and probably even the prior "extra" three 
> instructions for boolean support) is relegated to background noise.
> 
> Let's face it: int is likely used instead of bool for historical 
> reasons; probably just an artifact left over from pre-80386 days. Would 
> be nice to get that codegen cleaned up ~ especially since it was W who 
> claimed the reasons were performance related. Hacking the high-level 
> code with int vs boolean, just to reflect some archaic machine 
> instruction, is one of those things that come under the umbrella of 
> "premature optimization".

Yea, AFAIK setne is supported by 386 onward, plus a quick check of the GDC code that uses it seems 
to indicate it is faster (from the Eq1 and Eq2 samples earlier in the thread).

But you're right - in many cases it will probably be background noise anyhow 'cause you only save a 
couple of cycles.

As an aside, I think the current DMD backend may be well suited to the new Dual Core CPU because it 
hasn't been chasing after optimum performance on the P4 with it's 20 stage pipeline or whatever <g>