tooling quality and some random rant

Fri Feb 18 23:54:01 PST 2011

nedbrek wrote:
> Reordering happens in the scheduler. A simple model is "Fetch", "Schedule", 
> "Retire".  Fetch and retire are done in program order.  For code that is 
> hitting well in the cache, the biggest bottleneck is that "4" decoder (the 
> complex instruction decoder).  Reducing the number of complex instructions 
> will be a big win here (and settling them into the 4-1-1(-1) pattern).
> 
> Of course, on anything after Core 2, the "1" decoders can handle pushes, 
> pops, and load-ops (r+=m) (although not load-op-store (m+=r)).
> 
> Also, "macro op fusion" allows you can get a branch along with the last 
> instruction in decode, potentially giving you 5 macroinstructions per cycle 
> from decode.  Make sure it is the flags producing instruction (cmp-br).
> 
> (I used to work for Intel :)

I can't find any Intel documentation on this. Can you point me to some?