H1 2015 Priorities and Bare-Metal Programming

Mon Feb 2 13:44:41 PST 2015

On 2/2/2015 9:06 AM, Johannes Pfau wrote:
> _Dmain:
> 	push   rbp
> 	mov    rbp,rsp
> 	sub    rsp,0x10
> 	mov    rax,0x5                      <==
> 	mov    QWORD PTR [rbp-0x8],rax
> 	mov    ecx,DWORD PTR [rax]          <== a register based load
>
> The instruction it should generate is
> mov ecx, [0x5]

In 64 bit mode, there is no direct addressing like that. The above would be 
relative to the instruction pointer, which is RIP, and is actually:

    mov ECX, 5[RIP]

So, to load address location 5, you would have to load it into a register first.

(You'd be right for 32 bit x86. But also, all 32 bit x86's have an MMU rather 
than direct addressing, and it would be strange to set up the x86 embedded 
system to use MMIO rather than the IO instructions, which are designed for that 
purpose.)

> Not sure if it's actually more efficient on X86 but it makes a huge
> difference on real microcontroller architectures.

What addressing mode is generated by the back end has nothing whatsoever to do 
with using volatileLoad() or pragma(address).

To reiterate, volatileLoad() and volatileStore() are not reordered by the 
optimizer, and replacing them with pragma(address) is not going to make for 
better code generation.

The only real issue is the forceinline one.