Major performance problem with std.array.front()

Adam D. Ruppe destructionator at gmail.com
Thu Mar 6 20:01:14 PST 2014


On Friday, 7 March 2014 at 02:57:38 UTC, Walter Bright wrote:
> Yes, so that the user selects it, rather than having it wired 
> in everywhere and the user has to figure out how to defeat it.

BTW you know what would help this? A pragma we can attach to a 
struct which makes it a very thin value type.

pragma(thin_struct)
struct A {
    int a;
    int foo() { return a; }
    static A get() { A(10); }
}

void test() {
     A a = A.get();
     printf("%d", a.foo());
}

With the pragma, A would be completely indistinguishable from int 
in all ways.

What do I mean?
$ dmd -release -O -inline test56 -c

Let's look at A.foo:

A.foo:
    0:   55                      push   ebp
    1:   8b ec                   mov    ebp,esp
    3:   50                      push   eax
    4:   8b 00                   mov    eax,DWORD PTR [eax] ; 
waste!
    6:   8b e5                   mov    esp,ebp
    8:   5d                      pop    ebp
    9:   c3                      ret


It is line four that bugs me: the struct is passed as a 
*pointer*, but its only contents are an int, which could just as 
well be passed as a value. Let's compare it to an identical 
function in operation:

int identity(int a) { return a; }

00000000 <_D6test568identityFiZi>:
    0:   55                      push   ebp
    1:   8b ec                   mov    ebp,esp
    3:   83 ec 04                sub    esp,0x4
    6:   c9                      leave
    7:   c3                      ret

lol it *still* wastes time, setting up a stack frame for nothing. 
But we could just as well write asm { naked; ret; } and it would 
work as expected: the argument is passed in EAX and the return 
value is expected in EAX. The function doesn't actually have to 
do anything.


Anywho, the struct could work the same way. Now, I understand 
that we can't just change this unilaterally since it would break 
interaction with the C ABI, but we could opt in to some thinner 
stuff with a pragma.


Ideally, the thin struct would generate this code:

void A.get() {
    naked { // no need for stack frame here
        mov EAX, 10;
        ret;
    }
}

return A(10); when A is thin should be equal to return 10;. No 
need for NRVO, the object is super thin.

void A.foo() {
    naked { // no locals, no stack frame
        ret; // the last argument (this) is passed in EAX
             // and the return value goes in EAX
             // so we don't have to do anything
    }
}

Without the thin_struct thing, this would minimally look like

mov EAX, [EAX];
ret;

Having to load the value from the this pointer. But since it is 
thin, it is generated identically to an int, like the identity 
function above, so the value is already in the register!

Then, test:

void test() {
     naked { // don't need a stack frame here either!
         call A.get;
         // a is now in EAX, the value loaded right up
         call A.foo; // the this is an int and already
                     // where it needs to be, so just go
         // and finally, go ahead and call printf
         push EAX;
         push "%d".ptr;
         call printf;
         ret;
     }
}


Then, naturally, inlining A.get and A.foo might be possible 
(though I'd love to write them in assembly myself* and the 
compiler prolly can't inline them) but call/ret is fairly cheap, 
especially when compared to push/pop, so just keeping all the 
relevant stuff right in registers with no need to reference can 
really help us.

pragma(thin_struct)
struct RangedInt {
   int a;
   RangedInt opBinary(string op : "+")(int rhs) {
    asm {
      naked;
      add EAX, [rhs]; // or RDI on 64 bit! Don't even need to 
touch the stack! **
      jo throw_exception;
      ret;
    }
   }
}


Might still not be as perfect as intrinsics like bearophile is 
thinking of... but we'd be getting pretty close. And this kind of 
thing would be good for other thin wrappers too, we could 
magically make smart pointers too! (This can't be done now since 
returning a struct is done via hidden pointer argument instead of 
by register like a naked pointer).

** i'd kinda love it if we had an all-register calling convention 
on 32 bit too.... but eh oh well


More information about the Digitalmars-d mailing list