tooling quality and some random rant

nedbrek nedbrek at yahoo.com
Sat Feb 19 14:46:32 PST 2011


"distcc" <c at p.p> wrote in message news:ijp9ji$1hvd$1 at digitalmars.com...
> nedbrek Wrote:
>> "Walter Bright" <newshound2 at digitalmars.com> wrote in message
>> news:ijnt3o$22dm$1 at digitalmars.com...
>>> nedbrek wrote:
>>>> Also, "macro op fusion" allows you can get a branch along with the last
>>>> instruction in decode, potentially giving you 5 macroinstructions per
>>>> cycle from decode.  Make sure it is the flags producing instruction
>>>> (cmp-br).
>>>>
>>>
>>> I can't find any Intel documentation on this. Can you point me to some?
>>
>> The best available source is the optimization reference manual
>> (http://www.intel.com/products/processor/manuals/).  The latest version 
>> is
>> 248966.pdf, which mentions "Decodes up to four instructions, or up to 
>> five
>> with macro-fusion" (page 33).  Also, page 36: "Macro-fusion merges two
>> instructions into a single ?op. Intel Core microarchitecture is capable 
>> of
>> one macro-fusion per cycle in 32-bit operation".  It's unclear if macro
>> fusion is off entirely in 64 bit mode, and whether this has changed in 
>> more
>> recent processors...
>
> I remember reading that macro fusion is entirely off in 64 bit mode in 
> Nehalem
> and earlier generations, and supported in Sandy Bridge.
>
> When generating code for loops, the compiler could also make use of Loop 
> Stream
> Coder to avoid i-cache misses.

Serves me right, it is a little further in, page 52: "In Intel 
microarchitecture (Nehalem) , macro-fusion is supported in 64-bit mode, and 
the following instruction sequences are supported: (big list)".

That would leave it off of 65nm (Merom) and 45nm (Penryn) parts.  These are 
identifiable through CPUID.

The guide is broken up into sections based on the particular chip, so you 
end up having to read them all to get a general feel for things...

Ned




More information about the Digitalmars-d mailing list