Multiple return values...

Mon Mar 12 14:37:22 PDT 2012

On Mon, 12 Mar 2012 04:25:54 -0500, Manu <turkeyman at gmail.com> wrote:
> On 12 March 2012 04:00, Robert Jacques <sandford at jhu.edu> wrote:
>
>> On Sun, 11 Mar 2012 18:15:31 -0500, Timon Gehr <timon.gehr at gmx.ch> wrote:
>>
>>  On 03/11/2012 11:58 PM, Robert Jacques wrote:
>>>
>>>> Manu was arguing that MRV were somehow special and had mystical
>>>> optimization potential. That's simply not true.
>>>>
>>>
>>> Not exactly mystical, but it is certainly there.
>>>
>>> void main(){
>>>     auto a = foo(); // MRV/struct return
>>>     bar(&a.x); // defined in a different compilation unit
>>> }
>>>
>>> struct return has to write out the whole struct on the stack because of
>>> layout guarantees, probably making the optimized struct return calling
>>> convention somewhat slower for this case. The same does not hold for MRV.
>>>
>>
>> The layout of the struct only has to exist _when_ the address is taken.
>> Before that, the compiler/language/optimizer is free to (and does) do
>> whatever it want. Besides, in your example only the address of a field is
>> taken, the compiler will optimize away all the other pieces a (dead
>> variable elimination).
>>
>
> No, it can't. That's the point. It must preserve the struct in case you
> fiddle with the pointer. Taking the pointer is explicit in this case, but
> if you passed anything in the struct to another function by ref, you've
> setup the same scenario.

Okay, to be clear about things, once a struct is returned the optimizer can do anything to it wants. Certain compilers are extremely aggressive about this because on their hardware it matters. C and C++ compilers do this today, so yes, compilers can.

> Wait, ARM?! That's really cool. However, as far as I know, D on ARM is very
>> experimental. Having an experimental compiler not eak out every last cycle
>> is not something that should be unexpected.
>>
>> That said, I'm not sure what point you were trying to make, aside from
>> backend quality-of-implementation issues. I think bringing these issues up
>> is important, but they are tangent to the language changes you're asking
>> for.
>>
>
> This is using GCC's backend which is not really experimental, it has
> decades of field use. The point here is that we are seeing the effect of
> the C ABI applied directly to this problem, and it's completely un-workable.
> I'm trying to show that D needs to declare something of an ABI promise when
> applied to this problem if it is to be a useful+efficient feature. Again, C
> can't express this problem, and we won't get any value from of the C ABI to
> make this contruct efficient, but a very simple and efficient solution does
> exist.

GCC is very large collection of things and its backend has a general reputation of being second place to the commercial vendors by a decent margin (25+%) and I think also to LLVM. I was more referring to GDC's mapping to the GCC arm backend and the associated runtime issues, etc.

As for a simple and efficient solution existing: show me and academic paper or compiler that gets it right. Then show me the study on a large codebase that its actually more efficient. Then we will listen. Until then, I'm liable to trust existing wisdom.

> Why should D place this constraint on future compilers? D currently only
>> specifies the ABI for x86. I'm fairly sure it would follow the best
>> practices for each of the other architecture, but none of them have been
>> established yet.
>>
>
> Constraint? Perhaps you mean 'liberation'...
> The x86 ABI is not a *best* practise by a long shot. It is only banking on
> a traditional x86 trick for small structs.

Let us assume for a moment that the x86 design is good for x86, but terrible for ARM and vice versa. Why should either backend do something subpar for the other. Generating code for a IOE CPU vs OOE CPU vs a stack machine vs a register machine are all very different operations and the backend should have the liberation to do whatever is best.

> I'm was giving you an example that seemed to satisfy your complaints. An
>> no, actually it can't return in those registers at zero cost. There is a
>> reason why we don't use all the registers to both pass and return
>> arguments: we need some registers free to work on them both before and
>> after the call.
>
>
> "D should define an MRV ABI which is precisely the ABI for passing multiple
> args TO a function, but in reverse, for any given architecture." .. I've
> never said anything about using ALL the registers, I say to use all the
> ARGUMENT registers.
> On x64, that is 4 GPR regs, and 4 XMM regs.

The point is that increasing the number of return registers isn't free and that simply matching the best number of argument registers is not, ipso facto ideal.

>
> I know Go has MRV. What does its ABI look like? What does ARM prefer? I'd
>> recommend citing some papers or a compiler or something. Otherwise, it
>> looks like you're ignoring the wisdom of the masses or simply ignorant.
>>
>
> I don't have a Go toolchain, do you wanna run my tests above?
> Are you suggesting I have no idea what I'm talking about with respect to
> efficient calling conventions? The very fastest way is to return in the
> registers designed for the job. This is true for x64, ARM, everything. What
> to do when you exceed the argument register limit is a question for each
> architecture, but I maintain it should behave exactly as it does when
> calling a function, this way you create the possibility of super-efficient
> chain-calls.

But the return itself _isn't_ the core measure of the performance of the ABI; if the returner has to evacuate some of those return registers in order to compute the other return values, then you have to unnecessarily copy the evacuated value back to the registers. Similarly, if the returnee has to evacuate any of the returned registers in order to compute the next value, unnecessary copies happen.

> LLVM has support for MRV how I describe:
>
> The biggest change in LLVM 2.3 is Multiple Return Value (MRV) support. MRVs
> allow LLVM IR to directly represent functions that return multiple values
> without having to pass them "by reference" in the LLVM IR. This allows a
> front-end to generate more efficient code, *as MRVs are generally returned
> in registers if a target supports them*. See the LLVM IR
> Reference<http://llvm.org/releases/2.3/docs/LangRef.html#i_getresult>
> for
> more details.

Thanks for looking this up. From the reference:

   %struct.A = type { i32, i8 }
   %r = call %struct.A @foo()
   %gr = getresult %struct.A %r, 0    ; yields i32:%gr
   %gr1 = getresult %struct.A %r, 1   ; yields i8:%gr1
   add i32 %gr, 42
   add i8 %gr1, 41

and

The 'getresult' instruction takes a call or invoke value as its first argument, or an undef value. The value must have structure type. The second argument is a constant unsigned index value which must be in range for the number of values returned by the call.

It would appear that LLVM implements MRV via structs. Furthermore, I'm not positive on what they mean by "by reference", but I know some languages implement MRV using arrays.

> MRVs are fully supported in the LLVM IR, but are not yet fully supported in
> on all targets. However, it is generally safe to return up to 2 values from
> a function: most targets should be able to handle at least that. MRV
> support is a critical requirement for X86-64 ABI support, as X86-64
> requires the ability to return multiple registers from functions, and we
> use MRVs to accomplish this *in a direct way*.
> In this case, if we have the expression defined in the language (the other
> guys have convinced me we do, via tuples), it's conceivable the front end
> could present it to LLVM in such a way that it can produce great code
> already.

Digging into the x86-64 ABI, what they are talking about is the ability to support the two return GPR and two return XMM; i.e. the C ABI. Also, documentation on this changes between different systems.

> P.S. The fun(gun()) case is interesting, but it seems like a corner case.
>> Designing the ABI around it feels wrong, if it hurts performance elsewhere.
>>
>
> It's certainly not the goal of the feature, just a nice little side effect.
> And the MRV feature its self certainly doesn't hurt anything else
> anywhere...
> The whole thing is a feature that is missing from the C ABI, because C
> simply can't express the concept, so there's never been a reason to define
> it.

Except, the documentation you've linked to _is_ the C ABI.