Multiple return values...

Sun Mar 11 15:50:58 PDT 2012

On Sun, 11 Mar 2012 05:56:03 -0500, Manu <turkeyman at gmail.com> wrote:
> On 11 March 2012 03:45, Robert Jacques <sandford at jhu.edu> wrote:
>> On Sat, 10 Mar 2012 19:27:05 -0600, Manu <turkeyman at gmail.com> wrote:
>>> On 11 March 2012 00:25, Sean Cavanaugh <WorksOnMyMachine at gmail.com>
>>> wrote:
>>>  On 3/10/2012 4:37 AM, Manu wrote:

[snip]

>> Manu, please go read the D ABI (http://dlang.org/abi.html). Remember,
>> your example of returning two values using Tuple vs 'real' MRV? The D ABI
>> states that those values will be returned via registers. Returning
>> something larger? Then the NVRO kicks in which gives you a zero copy
>> approach. On x86-64 these limits are different, since you have more
>> registers to play with, but the concept is the same. In fact, returning
>> arguments has always been more efficient than passing arguments.
>>
>
> Please go read my prior posts. This has absolutely no bearing on what I'm
> talking about. In fact, it fuels my argument in some cases.
>
> That said, I've read it, and it scares the hell out of me. D states that a
> small struct may be returned in up to 2 registers (8 bytes/16 bytes), I
> suspect this is a hack introduced specifically to make ranges efficient?
> If it is not simply a struct of 2 register sized things, they get packed
> into a magic 8-16 byte struct implicitly for return, and this makes my
> brain explode. If I wanted to perform a bunch of shifts, or's, and and's
> until it's snugly packed into a 8-16 byte block before returning, I'd do
> that explicitly... I DON'T want to do that however, and I *really* don't
> want the language doing that for me. If I want to return 4 byte values,
> they should be returned in 4 registers, not packed together into a 32bit
> word and returned in one. Also, if there are mixed ints, floats, vectors,
> even other structs, none of it works; what if there is float data? Swapping
> registers now? That's one of the worst penalties there is.

Except that a 4 byte struct is packed into a single memory word and that its impossible to return using 4 registers in D (EBX is treated specially; I think it's for TLS and stuff, but I'm not sure). As for the rest, all of these low level optimizations are standard C++ techniques back from when computers were still 16-bits. Walter has updated the D backend as little as possible, so pretty much everything comes from how DMC++ did things. It's also how Microsoft, Apple and *nix do things.

GCC-64 for example, does the following "Entire object is returned in integer registers and/or XMM registers if the size is no bigger than 128 bits, otherwise on the stack. Each 64-bit part of the object is
transferred in an XMM register if it contains only float or double, or in an integer register if it contains integer types or mixed integer and float. Two consecutive float’s can be packed into the lower half of one XMM register. Consecutive double’s are not packed. No more than 64 bits of each XMM register is used. Use P if not enough vacant registers. P: Pointer to temporary memory space passed to function. Pointer may be passed in register if fastcall or 64-bit mode, otherwise on stack. Same pointer is returned in AX, EAX or RAX."

Manu, if this is truly only breaking you head now, then all I can conclude is that you opened this discussion on low level function return optimizations with absolutely zero knowledge of the subject matter you were trying to discus; I find this very frustrating.