Major performance problem with std.array.front()
Adam D. Ruppe
destructionator at gmail.com
Thu Mar 6 20:01:14 PST 2014
On Friday, 7 March 2014 at 02:57:38 UTC, Walter Bright wrote:
> Yes, so that the user selects it, rather than having it wired
> in everywhere and the user has to figure out how to defeat it.
BTW you know what would help this? A pragma we can attach to a
struct which makes it a very thin value type.
pragma(thin_struct)
struct A {
int a;
int foo() { return a; }
static A get() { A(10); }
}
void test() {
A a = A.get();
printf("%d", a.foo());
}
With the pragma, A would be completely indistinguishable from int
in all ways.
What do I mean?
$ dmd -release -O -inline test56 -c
Let's look at A.foo:
A.foo:
0: 55 push ebp
1: 8b ec mov ebp,esp
3: 50 push eax
4: 8b 00 mov eax,DWORD PTR [eax] ;
waste!
6: 8b e5 mov esp,ebp
8: 5d pop ebp
9: c3 ret
It is line four that bugs me: the struct is passed as a
*pointer*, but its only contents are an int, which could just as
well be passed as a value. Let's compare it to an identical
function in operation:
int identity(int a) { return a; }
00000000 <_D6test568identityFiZi>:
0: 55 push ebp
1: 8b ec mov ebp,esp
3: 83 ec 04 sub esp,0x4
6: c9 leave
7: c3 ret
lol it *still* wastes time, setting up a stack frame for nothing.
But we could just as well write asm { naked; ret; } and it would
work as expected: the argument is passed in EAX and the return
value is expected in EAX. The function doesn't actually have to
do anything.
Anywho, the struct could work the same way. Now, I understand
that we can't just change this unilaterally since it would break
interaction with the C ABI, but we could opt in to some thinner
stuff with a pragma.
Ideally, the thin struct would generate this code:
void A.get() {
naked { // no need for stack frame here
mov EAX, 10;
ret;
}
}
return A(10); when A is thin should be equal to return 10;. No
need for NRVO, the object is super thin.
void A.foo() {
naked { // no locals, no stack frame
ret; // the last argument (this) is passed in EAX
// and the return value goes in EAX
// so we don't have to do anything
}
}
Without the thin_struct thing, this would minimally look like
mov EAX, [EAX];
ret;
Having to load the value from the this pointer. But since it is
thin, it is generated identically to an int, like the identity
function above, so the value is already in the register!
Then, test:
void test() {
naked { // don't need a stack frame here either!
call A.get;
// a is now in EAX, the value loaded right up
call A.foo; // the this is an int and already
// where it needs to be, so just go
// and finally, go ahead and call printf
push EAX;
push "%d".ptr;
call printf;
ret;
}
}
Then, naturally, inlining A.get and A.foo might be possible
(though I'd love to write them in assembly myself* and the
compiler prolly can't inline them) but call/ret is fairly cheap,
especially when compared to push/pop, so just keeping all the
relevant stuff right in registers with no need to reference can
really help us.
pragma(thin_struct)
struct RangedInt {
int a;
RangedInt opBinary(string op : "+")(int rhs) {
asm {
naked;
add EAX, [rhs]; // or RDI on 64 bit! Don't even need to
touch the stack! **
jo throw_exception;
ret;
}
}
}
Might still not be as perfect as intrinsics like bearophile is
thinking of... but we'd be getting pretty close. And this kind of
thing would be good for other thin wrappers too, we could
magically make smart pointers too! (This can't be done now since
returning a struct is done via hidden pointer argument instead of
by register like a naked pointer).
** i'd kinda love it if we had an all-register calling convention
on 32 bit too.... but eh oh well
More information about the Digitalmars-d
mailing list