tooling quality and some random rant
Walter Bright
newshound2 at digitalmars.com
Fri Feb 18 23:54:01 PST 2011
nedbrek wrote:
> Reordering happens in the scheduler. A simple model is "Fetch", "Schedule",
> "Retire". Fetch and retire are done in program order. For code that is
> hitting well in the cache, the biggest bottleneck is that "4" decoder (the
> complex instruction decoder). Reducing the number of complex instructions
> will be a big win here (and settling them into the 4-1-1(-1) pattern).
>
> Of course, on anything after Core 2, the "1" decoders can handle pushes,
> pops, and load-ops (r+=m) (although not load-op-store (m+=r)).
>
> Also, "macro op fusion" allows you can get a branch along with the last
> instruction in decode, potentially giving you 5 macroinstructions per cycle
> from decode. Make sure it is the flags producing instruction (cmp-br).
>
> (I used to work for Intel :)
I can't find any Intel documentation on this. Can you point me to some?
More information about the Digitalmars-d
mailing list