Precise GC

Fri Apr 13 07:25:04 PDT 2012

On Friday, 13 April 2012 at 13:54:39 UTC, Manu wrote:
> No other processors have branch prediction units anywhere near
> the sophistication of modern x86. Any call through a function 
> pointer
> stalls the pipeline, pipelines are getting longer all the time, 
> and PPC has
> even more associated costs/hazards.
> Most processors can only perform trivial binary branch 
> prediction around an
> 'if'.
> It also places burden on the icache (unable to prefetch), and 
> of course the
> dcache, both of which are much less sophisticated than x86 
> aswell.

Allocation of small aggregated objects usually involves 
allocation of several equally small objects of different types in 
a row, so they sit one after another in heap and gc will visit 
them in a row every time calling function different from the 
previous time, so to x86 processor it would result in constant 
misprediction: AFAIK x86 processor caches only one target address 
per branch (ARM caches a flag?). And icache should not suffer in 
both cases: once you prefetched the function, it will remain in 
the icache and be reused from there the next time.