Linus with some good observations on garbage collection

Wed May 4 18:57:15 PDT 2011

On 5/4/2011 7:38 PM, Don wrote:
> Jérôme M. Berger wrote:
>> Bruno Medeiros wrote:
>>> On 29/04/2011 17:30, Andrei Alexandrescu wrote:
>>>> On 4/29/11 10:40 AM, Bruno Medeiros wrote:
>>>>> On 23/04/2011 15:45, Andrei Alexandrescu wrote:
>>>>>> On 4/23/11 8:57 AM, dsimcha wrote:
>>>>>>> BTW, since when does the ternary operator work with functions, as
>>>>>>> opposed to variables?
>>>>> And that's since the C days btw. The function call is just an
>>>>> operation,
>>>>> the callee an operand, so any expression can be there.
>>>>>
>>>>>> They are converted to pointers to functions. Not something I'd
>>>>>> recommend
>>>>>> because it makes the call slower.
>>>>>>
>>>>>> Andrei
>>>>> Hum, I didn't know that. Why does it make the call slower (other than
>>>>> the actual ternary operation), is it because it has to load the
>>>>> function
>>>>> address from a register/variable, instead of being a constant value?
>>>>> And/or not being able to inline the call?
>>>>> I mean, the first aspect seems like a very minor impact in
>>>>> performance,
>>>>> almost negligible.
>>>> It's an indirect call instead of a direct call.
>>>>
>>>> Andrei
>>> Well, yes, that's kinda of what I was thinking already (although I
>>> referred to more low-level, assembler terms).
>>> But my question remains, why is an indirect function call slower than a
>>> direct one, and is that difference significant? (excluding of course the
>>> possibilities of inlining and other static analysis that direct
>>> invocations offer) There is no extra overhead other than loading the
>>> function address from a register/variable, right?
>>>
>> AIUI, with a direct call the CPU can anticipate the call (because
>> it knows the target address) and start filling the pipeline
>> immediately. OTOH, with an indirect call the CPU cannot know the
>> target until it reads the register, which means that the pipeline
>> will empty itself before the call is executed. With the pipeline
>> length of some modern CPUs (Pentium 4 being amongst the worst), this
>> can result in very large slowdowns.
>>
>> Jerome
>
> That was true ten years ago, but not any more. Modern CPUs can do branch
> prediction of indirect jumps.

Yeh, I tested this a while back and can dig up/recreate the benchmark if 
need be.  A virtual function call that's always resolving to the same 
address (so the branch is highly predictable) is not measurably slower 
than a _non-inlined_ direct function call.  However, if the branch is 
not predictable, it's substantially slower and virtual functions usually 
can't be inlined.