H1 2015 Priorities and Bare-Metal Programming

Mon Feb 2 14:30:05 PST 2015

Am Mon, 02 Feb 2015 13:44:41 -0800
schrieb Walter Bright <newshound2 at digitalmars.com>:

> On 2/2/2015 9:06 AM, Johannes Pfau wrote:
> > _Dmain:
> > 	push   rbp
> > 	mov    rbp,rsp
> > 	sub    rsp,0x10
> > 	mov    rax,0x5                      <==
> > 	mov    QWORD PTR [rbp-0x8],rax
> > 	mov    ecx,DWORD PTR [rax]          <== a register based
> > load
> >
> > The instruction it should generate is
> > mov ecx, [0x5]
> 
> In 64 bit mode, there is no direct addressing like that. The above
> would be relative to the instruction pointer, which is RIP, and is
> actually:
> 
>     mov ECX, 5[RIP]
> 
> So, to load address location 5, you would have to load it into a
> register first.
> 
> (You'd be right for 32 bit x86. But also, all 32 bit x86's have an
> MMU rather than direct addressing, and it would be strange to set up
> the x86 embedded system to use MMIO rather than the IO instructions,
> which are designed for that purpose.)
> 

Well, as I said it's different on RISC. I'm mainly programming for ARM,
AVR MSP430 and similar systems, not X86. 

> 
> > Not sure if it's actually more efficient on X86 but it makes a huge
> > difference on real microcontroller architectures.
> 
> What addressing mode is generated by the back end has nothing
> whatsoever to do with using volatileLoad() or pragma(address).

I does: if the backend can't know that a value is known at compile time
it cant use absolute addresses:

void test(ubyte* ptr)
{
    volatileLoad(ptr); //Can't use literal addressing might be runtime
    value
}

The context here is that pragma(address) allows avoiding one wrapper
function. See below.

ARM can code address literals into instructions. So you end up with one
instruction for a load from a compile time known address.

> 
> To reiterate, volatileLoad() and volatileStore() are not reordered by
> the optimizer, and replacing them with pragma(address) is not going
> to make for better code generation.
> 
> The only real issue is the forceinline one.

I think we're talking different languages. Nobody ever proposed
pragma(address) to replace volatileLoad. It's meant to be used together
with the volatile intrinsics like this:
-----------------------------------------------------------------
import core.bitop;

struct Volatile(T)
{
private:
    T _store;

public:
    @disable this(this);

    /**
     * Performs 1 load followed by 1 store
     */
    @attribute("inlineonly") void opOpAssign(string op)(in T rhs)
nothrow @trusted {
        T val = volatileLoad(&_store);
        mixin("val" ~ op ~ "= rhs;");
        volatileStore(&_store, val);
    }
    //In reality, much more complicated wrappers are possible
    //http://pastebin.com/RGhKdm9i
}

pragma(address, 0x05) extern __gshared Volatile!ubyte PORTA;

//...
PORTA |= 0b0000_0001;
auto addr = &PORTA;
-----------------------------------------------------------------

The pragma(address, 0x05) makes sure that the compiler backend always
knows that PORTA is at 0x05. Thinks like &PORTA become trivial and the
compiler backend has exactly the same knowledge as if you'd use C
volatile => all optimizations apply.

And if you call opopAssign the backend knows that the this pointer is a
compile time literal value and generates exactly the same code as if
you wrote
        T val = volatileLoad(0x05);
        val ~ op ~ = rhs;
        volatileStore(0x05, val);

but if you instead write
@property ref Volatile!ubyte PORTA()
{
    return *(cast(Volatile!(ubyte)*)0x05)
}

PORTA |= now calls a function behind the scenes. The backend does not
immediately know that &PORTA is always 0x05. Also the this pointer in
opopAssign is no longer a compile time constant. And this is were the
constant/runtime value code gen difference discussed above matters.

-O mostly fixes performance problems, but adding an additional property
function is still much uglier than declaring an extern variable with an
address in many ways. (compiler bugs, user-facing code, debug info, ...)

Also it's a conceptually nice way for typed registers: You can read it
as: I've got a Register of type PORT which is an extern variable located
add a fixed address. PORT abstract away volatile access.