Andrei Alexandrescu needs to read this

Thu Oct 31 11:54:27 UTC 2019

On Sunday, 27 October 2019 at 20:11:38 UTC, Mark wrote:
> Would it be reasonable to say that modern CPUs basically do JIT 
> compilation of assembly instructions?

Old CISC CPUs did just that, so they could have high level 
instructions, and were reprogrammable... (I guess x86 also has 
that feature, at least to some extent).

Then RISC CPUs came in the 90s and didn't do that, thus they were 
faster and more compact as they could throw out the decoder (the 
bits in the instructions were carefully designed so that the 
decoding was instantaneous). But then memory bandwidth became an 
issue and developers started to write more and more bloated 
software...

x86 is an old CISC architecture and simply survives because of 
market dominance and R&D investments. Also, with increased real 
estate (more transistors) they can sacrifice lots of space for 
the instruction decoding...

The major change over the past 40 years that is causing 
sensitivity to instruction ordering is that modern CPUs can have 
deep pipelines (executing many instructions at the same time in a 
long staging queue), that they are superscalar (execute 
instructions in parallell), execute instructions speculatively 
(execute instructions even though the result might be discarded 
later), do tight-loop instruction unrolling before pipelining, 
and have various schemes for branch prediction (so that they 
execute the right sequence after a branch before they know what 
the branch-condition looks like).

Is this a good approach? Probably not... You would get much 
better performance from the same number of transistors by using 
many simple cores and a clever memory architecture, but that 
would not work with current software and development practice...

> branch predictor and so on. If so, you could argue that the 
> Itanium was an attempt to avoid this "runtime" and transfer all 
> these responsibilities to the compiler and/or programmer. Not a 
> very successful one, apparently.

VLIW is not a bad concept, re RISC, but perhaps not profitable in 
terms of R&D.

You probably could get better scheduling of instructions if it 
was determined to be optimal statically. As the compiler would 
then have a "perfect model" of how much of the CPU is being 
utilized, and could give programmers feedback on it too.  But 
then you would need to recompile software to the actual CPU and 
have more advanced compilers, and perhaps write software in a 
different manner to avoid bad branching patterns.

Existing software code bases and a developer culture that is 
resilient to change do limit progress...

People pay to have their existing stuff to run well, they won't 
pay if they have to write new stuff in new ways, unless the 
benefits are extreme (e.g. GPUs)...