Multiple return values...

Sun Mar 11 16:30:33 PDT 2012

On 12 March 2012 00:50, Robert Jacques <sandford at jhu.edu> wrote:

> On Sun, 11 Mar 2012 05:56:03 -0500, Manu <turkeyman at gmail.com> wrote:
>
>> On 11 March 2012 03:45, Robert Jacques <sandford at jhu.edu> wrote:
>>
>>> On Sat, 10 Mar 2012 19:27:05 -0600, Manu <turkeyman at gmail.com> wrote:
>>>
>>>> On 11 March 2012 00:25, Sean Cavanaugh <WorksOnMyMachine at gmail.com>
>>>> wrote:
>>>>  On 3/10/2012 4:37 AM, Manu wrote:
>>>>
>>>
> [snip]
>
>  Manu, please go read the D ABI (http://dlang.org/abi.html). Remember,
>>> your example of returning two values using Tuple vs 'real' MRV? The D ABI
>>> states that those values will be returned via registers. Returning
>>> something larger? Then the NVRO kicks in which gives you a zero copy
>>> approach. On x86-64 these limits are different, since you have more
>>> registers to play with, but the concept is the same. In fact, returning
>>> arguments has always been more efficient than passing arguments.
>>>
>>>
>> Please go read my prior posts. This has absolutely no bearing on what I'm
>> talking about. In fact, it fuels my argument in some cases.
>>
>> That said, I've read it, and it scares the hell out of me. D states that a
>> small struct may be returned in up to 2 registers (8 bytes/16 bytes), I
>> suspect this is a hack introduced specifically to make ranges efficient?
>> If it is not simply a struct of 2 register sized things, they get packed
>> into a magic 8-16 byte struct implicitly for return, and this makes my
>> brain explode. If I wanted to perform a bunch of shifts, or's, and and's
>> until it's snugly packed into a 8-16 byte block before returning, I'd do
>> that explicitly... I DON'T want to do that however, and I *really* don't
>>
>> want the language doing that for me. If I want to return 4 byte values,
>> they should be returned in 4 registers, not packed together into a 32bit
>> word and returned in one. Also, if there are mixed ints, floats, vectors,
>> even other structs, none of it works; what if there is float data?
>> Swapping
>> registers now? That's one of the worst penalties there is.
>>
>
> Except that a 4 byte struct is packed into a single memory word and that
> its impossible to return using 4 registers in D (EBX is treated specially;
> I think it's for TLS and stuff, but I'm not sure). As for the rest, all of
> these low level optimizations are standard C++ techniques back from when
> computers were still 16-bits. Walter has updated the D backend as little as
> possible, so pretty much everything comes from how DMC++ did things. It's
> also how Microsoft, Apple and *nix do things.
>

x86 is not the only architecture on earth, it's arguably not even a
particularly important one commercially with respect to realtime software.
Also from a codegen point of view, it's usually the least important
architecture by a mile, since x86 chips don't actually run the code they
receive, they reinterpret it to some microcode and perform a crap load of
clever optimisations in the process. Every other architecture has many more
registers, and uses many more of them for passing args. They also suffer
much greater penalties than x86 in general.

What I'm talking about is a language feature that implements (and
guarantees) a policy to return many things in EXACTLY the same way that it
passes many args TO a function. This is already the most efficient way to
pass many things between functions in any given architecture.

In addition to that, I'm also discussing the usefulness of a nice sugary
syntax to do this at the same time, for clarity and productivity.

GCC-64 for example, does the following "Entire object is returned in
> integer registers and/or XMM registers if the size is no bigger than 128
> bits, otherwise on the stack. Each 64-bit part of the object is
> transferred in an XMM register if it contains only float or double, or in
> an integer register if it contains integer types or mixed integer and
> float. Two consecutive float’s can be packed into the lower half of one XMM
> register.

They shouldn't need to be packed into anything, they already live in their
own registers. The ABI specifies a certain number of argument registers, it
can return in those at zero cost.
And again, you're only considering x86.

> Consecutive double’s are not packed. No more than 64 bits of each XMM
> register is used. Use P if not enough vacant registers. P: Pointer to
> temporary memory space passed to function. Pointer may be passed in
> register if fastcall or 64-bit mode, otherwise on stack. Same pointer is
> returned in AX, EAX or RAX."
>

Wasteful, it still has more arg registers available. It's limiting its self
by traditional C conventions.

Manu, if this is truly only breaking you head now, then all I can conclude
> is that you opened this discussion on low level function return
> optimizations with absolutely zero knowledge of the subject matter you were
> trying to discus; I find this very frustrating.
>

I have 15+ years of experience with C on basically every architecture you
could name, I'm absolutely aware of what C ABI's look like with regard to
returning structures by value.
I find the common assumption that all computers on earth are x86 perhaps
even more frustrating, and the fact that you've missed the point about
usage of ALL the argument registers to avoid pointless packing/unpacking
and memory access entirely.
C is incapable of expressing MRV, and doesn't have an ABI for it, talking
about C compiler optimisation tricks is irrelevant. D should define an MRV
ABI which is precisely the ABI for passing multiple args TO a function, but
in reverse, for any given architecture. This also has the lovely side
effect of guaranteeing correct argument placement for chain-called
functions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120312/0ff4019d/attachment-0001.html>