{OT} Youtube Video: newCTFE: Starting to write the x86 JIT

Tue Apr 25 09:16:43 PDT 2017

On Tuesday, 25 April 2017 at 09:09:00 UTC, Ola Fosheim Grøstad 
wrote:
> On Monday, 24 April 2017 at 17:48:50 UTC, Stefan Koch wrote:
>> [...]
>
> Oh, ok. AFAIK The decoding of indexing modes into micro-ops 
> (the real instructions used inside the CPU, not the actual 
> op-codes) has no effect on the caching system. It may however 
> compress the generated code so you don't flush the instruction 
> cache and speed up the decoding of op-codes into micro-ops.
>
> If you want to improve cache loads you have to consider when to 
> use the "prefetch" instructions, but the effect (positive or 
> negative) varies greatly between CPU generations so you will 
> basically need to target each CPU-generation individually.
>
> Probably too much work to be worthwhile as it usually doesn't 
> pay off until you work on large datasets and then you usually 
> have to be careful with partitioning the data into 
> cache-friendly working-sets. Probably not so easy to do for a 
> JIT.
>
> You'll probably get a decent performance boost without worrying 
> about caching too much in the first implementation anyway. Any 
> gains in that area could be obliterated in the next CPU 
> generation... :-/

It's already the case. Intel and AMD (especially in Ryzen) 
strongly discourage the use of prefetch instructions since at 
least Core2 and Athlon64. The icache cost rarely pays off and 
very often breaks the auto-prefetcher algorithms by spoiling 
memory bandwidth.