Multiple return values...

Iain Buclaw ibuclaw at ubuntu.com
Mon Mar 12 10:49:02 PDT 2012


On 12 March 2012 17:22, Manu <turkeyman at gmail.com> wrote:
> On 12 March 2012 19:03, Iain Buclaw <ibuclaw at ubuntu.com> wrote:
>>
>> On 12 March 2012 00:44, Manu <turkeyman at gmail.com> wrote:
>> > On 12 March 2012 00:58, Robert Jacques <sandford at jhu.edu> wrote:
>> >>
>> >> That's an argument for using the right register for the job. And we can
>> >> /
>> >> will be doing this on x86-64, as other compilers have already done.
>> >> Manu was
>> >> arguing that MRV were somehow special and had mystical optimization
>> >> potential. That's simply not true.
>> >
>> >
>> > Here's some tests for you:
>> >
>> > // first test that the argument registers allocate as expected...
>> > int gprtest(int x, int y, int z)
>> > {
>> > return x+y+z;
>> > }
>> >
>> >    Perfect, ints pass in register sequence, return in r0, no memory
>> > access
>> > add r0, r0, r1
>> > add r0, r0, r2
>> > bx lr
>> >
>> > float fptest(float x, float y, float z)
>> > {
>> > return x+y+z;
>> > }
>> >
>> >    Same for floats
>> > fadds s0, s0, s1
>> > fadds s0, s0, s2
>> > bx lr
>> >
>> >
>> > // Some MRV tests...
>> > auto mrv1(int x, int z)
>> > {
>> > return Tuple!(int, int)(x, z);
>> > }
>> >
>> >   Simple case, 2 ints
>> >   FAIL, stores the 2 arguments it received in regs straight to output
>> > struct
>> > pointer supplied
>> > stmia r0, {r1, r2}
>> > bx lr
>> >
>> >
>> > auto mrv2(int x, float y, byte z)
>> > {
>> > return Tuple!(int, float, byte)(x, y, z);
>> > }
>> >
>> >   Different typed things
>> >   EPIC FAIL
>> > stmfd sp!, {r4, r5}
>> > mov ip, #0
>> > sub sp, sp, #24
>> > mov r4, r2
>> > str ip, [sp, #12]
>> > str ip, [sp, #20]
>> > ldr r2, .L27
>> > add ip, sp, #24
>> > mov r3, r0
>> > mov r5, r1
>> > str r2, [sp, #16] @ float
>> > ldmdb ip, {r0, r1, r2}
>> > stmia r3, {r0, r1, r2}
>> > fsts s0, [r3, #4]
>> > stmia sp, {r0, r1, r2}
>> > str r5, [r3, #0]
>> > strb r4, [r3, #8]
>> > mov r0, r3
>> > add sp, sp, #24
>> > ldmfd sp!, {r4, r5}
>> > bx lr
>> >
>> >
>> > auto range(int *p)
>> > {
>> > return p[0..1];
>> > }
>> >
>> >   Range
>> >   SURPRISE FAIL, even a range is returned as a struct! O_O
>> > mov r2, #1
>> > str r2, [r0, #0]
>> > str r1, [r0, #4]
>> > bx lr
>> >
>> >
>> > So the D ABI is a complete shambles on ARM!
>> > Unsurprisingly, it all just follows the return struct by-val ABI, which
>> > is
>> > to write it to the stack unconditionally. And sadly, it even thinks the
>> > internal types like range+delegate are just a struct by-val, and
>> > completely
>> > ruins those!
>> >
>> > Let's try again with x86...
>> >
>> >
>> > auto mrv1(int x, int z)
>> > {
>> > return Tuple!(int, int)(x, z);
>> > }
>> >
>> > Returns in eax/edx as expected
>> >  movl 4(%esp), %eax
>> >  movl 8(%esp), %edx
>> >
>> >
>> > auto mrv2(int x, float y, int z)
>> > {
>> > return Tuple!(int, float, int)(x, y, z);
>> > }
>> >
>> > FAIL! All written to a struct rather than returning in eax,edx,st0 ..
>> > This
>> > is C ABI baggage, D can do better.
>> >  movl 4(%esp), %eax
>> >  movl 8(%esp), %edx
>> >  movl %edx, (%eax)
>> >  movl 12(%esp), %edx
>> >  movl %edx, 4(%eax)
>> >  movl 16(%esp), %edx
>> >  movl %edx, 8(%eax)
>> >  ret $4
>> >
>> >
>> > auto range(int *p)
>> > {
>> > return p[0..1];
>> > }
>> >
>> > Obviously, the small struct optimisation allows this to work properly
>> >  movl $1, %eax
>> >  movl 4(%esp), %edx
>> >  ret
>> >
>> >
>> > All that said, x86 isn't a good test case, since all args are ALWAYS
>> > passed
>> > on the stack. x64 would be a much better test since it actually has arg
>> > registers, but I'm on windows, so no x64 for me...
>>
>>
>> What compiler flags are you using here?  For x86, I would have thought
>> that small structs (< 8 bytes) would be passed back in registers...
>> only speculating though - will need to see what codegen is being built
>> from the D code provided to be sure.
>
>
> -S -O2 -msse2
> And as expected, 8byte structs were returned packed in registers from my
> examples above. That's a traditional x86 ABI hack which conveniently allows
> delegates+ranges to work well on x86, but as you can see, they're proper
> broken on other architectures.

OK, -msse2 is not an ARM target option. :~)


Looking around, the "Procedure Call Standard for the ARM Architecture"
specifically says (section 5.4: Result Return):

"A Composite Type not larger than 4 bytes is returned in R0."

"A Composite Type larger than 4 bytes ... is stored in memory at an
address passed as an extra argument when the function was called ..."



Feel free to correct me if that document is slightly out of date.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


More information about the Digitalmars-d mailing list