Multiple return values...

Mon Mar 12 10:50:39 PDT 2012

On 12 March 2012 17:49, Iain Buclaw <ibuclaw at ubuntu.com> wrote:
> On 12 March 2012 17:22, Manu <turkeyman at gmail.com> wrote:
>> On 12 March 2012 19:03, Iain Buclaw <ibuclaw at ubuntu.com> wrote:
>>>
>>> On 12 March 2012 00:44, Manu <turkeyman at gmail.com> wrote:
>>> > On 12 March 2012 00:58, Robert Jacques <sandford at jhu.edu> wrote:
>>> >>
>>> >> That's an argument for using the right register for the job. And we can
>>> >> /
>>> >> will be doing this on x86-64, as other compilers have already done.
>>> >> Manu was
>>> >> arguing that MRV were somehow special and had mystical optimization
>>> >> potential. That's simply not true.
>>> >
>>> >
>>> > Here's some tests for you:
>>> >
>>> > // first test that the argument registers allocate as expected...
>>> > int gprtest(int x, int y, int z)
>>> > {
>>> > return x+y+z;
>>> > }
>>> >
>>> >    Perfect, ints pass in register sequence, return in r0, no memory
>>> > access
>>> > add r0, r0, r1
>>> > add r0, r0, r2
>>> > bx lr
>>> >
>>> > float fptest(float x, float y, float z)
>>> > {
>>> > return x+y+z;
>>> > }
>>> >
>>> >    Same for floats
>>> > fadds s0, s0, s1
>>> > fadds s0, s0, s2
>>> > bx lr
>>> >
>>> >
>>> > // Some MRV tests...
>>> > auto mrv1(int x, int z)
>>> > {
>>> > return Tuple!(int, int)(x, z);
>>> > }
>>> >
>>> >   Simple case, 2 ints
>>> >   FAIL, stores the 2 arguments it received in regs straight to output
>>> > struct
>>> > pointer supplied
>>> > stmia r0, {r1, r2}
>>> > bx lr
>>> >
>>> >
>>> > auto mrv2(int x, float y, byte z)
>>> > {
>>> > return Tuple!(int, float, byte)(x, y, z);
>>> > }
>>> >
>>> >   Different typed things
>>> >   EPIC FAIL
>>> > stmfd sp!, {r4, r5}
>>> > mov ip, #0
>>> > sub sp, sp, #24
>>> > mov r4, r2
>>> > str ip, [sp, #12]
>>> > str ip, [sp, #20]
>>> > ldr r2, .L27
>>> > add ip, sp, #24
>>> > mov r3, r0
>>> > mov r5, r1
>>> > str r2, [sp, #16] @ float
>>> > ldmdb ip, {r0, r1, r2}
>>> > stmia r3, {r0, r1, r2}
>>> > fsts s0, [r3, #4]
>>> > stmia sp, {r0, r1, r2}
>>> > str r5, [r3, #0]
>>> > strb r4, [r3, #8]
>>> > mov r0, r3
>>> > add sp, sp, #24
>>> > ldmfd sp!, {r4, r5}
>>> > bx lr
>>> >
>>> >
>>> > auto range(int *p)
>>> > {
>>> > return p[0..1];
>>> > }
>>> >
>>> >   Range
>>> >   SURPRISE FAIL, even a range is returned as a struct! O_O
>>> > mov r2, #1
>>> > str r2, [r0, #0]
>>> > str r1, [r0, #4]
>>> > bx lr
>>> >
>>> >
>>> > So the D ABI is a complete shambles on ARM!
>>> > Unsurprisingly, it all just follows the return struct by-val ABI, which
>>> > is
>>> > to write it to the stack unconditionally. And sadly, it even thinks the
>>> > internal types like range+delegate are just a struct by-val, and
>>> > completely
>>> > ruins those!
>>> >
>>> > Let's try again with x86...
>>> >
>>> >
>>> > auto mrv1(int x, int z)
>>> > {
>>> > return Tuple!(int, int)(x, z);
>>> > }
>>> >
>>> > Returns in eax/edx as expected
>>> >  movl 4(%esp), %eax
>>> >  movl 8(%esp), %edx
>>> >
>>> >
>>> > auto mrv2(int x, float y, int z)
>>> > {
>>> > return Tuple!(int, float, int)(x, y, z);
>>> > }
>>> >
>>> > FAIL! All written to a struct rather than returning in eax,edx,st0 ..
>>> > This
>>> > is C ABI baggage, D can do better.
>>> >  movl 4(%esp), %eax
>>> >  movl 8(%esp), %edx
>>> >  movl %edx, (%eax)
>>> >  movl 12(%esp), %edx
>>> >  movl %edx, 4(%eax)
>>> >  movl 16(%esp), %edx
>>> >  movl %edx, 8(%eax)
>>> >  ret $4
>>> >
>>> >
>>> > auto range(int *p)
>>> > {
>>> > return p[0..1];
>>> > }
>>> >
>>> > Obviously, the small struct optimisation allows this to work properly
>>> >  movl $1, %eax
>>> >  movl 4(%esp), %edx
>>> >  ret
>>> >
>>> >
>>> > All that said, x86 isn't a good test case, since all args are ALWAYS
>>> > passed
>>> > on the stack. x64 would be a much better test since it actually has arg
>>> > registers, but I'm on windows, so no x64 for me...
>>>
>>>
>>> What compiler flags are you using here?  For x86, I would have thought
>>> that small structs (< 8 bytes) would be passed back in registers...
>>> only speculating though - will need to see what codegen is being built
>>> from the D code provided to be sure.
>>
>>
>> -S -O2 -msse2
>> And as expected, 8byte structs were returned packed in registers from my
>> examples above. That's a traditional x86 ABI hack which conveniently allows
>> delegates+ranges to work well on x86, but as you can see, they're proper
>> broken on other architectures.
>
> OK, -msse2 is not an ARM target option. :~)
>
>
> Looking around, the "Procedure Call Standard for the ARM Architecture"
> specifically says (section 5.4: Result Return):
>
> "A Composite Type not larger than 4 bytes is returned in R0."
>
> "A Composite Type larger than 4 bytes ... is stored in memory at an
> address passed as an extra argument when the function was called ..."
>
>
>
> Feel free to correct me if that document is slightly out of date.
>
>
> --
> Iain Buclaw
>
> *(p < e ? p++ : p) = (c & 0x0f) + '0';


Link:  http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';