<div class="gmail_quote">On 12 March 2012 00:50, Robert Jacques <span dir="ltr"><<a href="mailto:sandford@jhu.edu">sandford@jhu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Sun, 11 Mar 2012 05:56:03 -0500, Manu <<a href="mailto:turkeyman@gmail.com" target="_blank">turkeyman@gmail.com</a>> wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

On 11 March 2012 03:45, Robert Jacques <<a href="mailto:sandford@jhu.edu" target="_blank">sandford@jhu.edu</a>> wrote:<br>

</div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On Sat, 10 Mar 2012 19:27:05 -0600, Manu <<a href="mailto:turkeyman@gmail.com" target="_blank">turkeyman@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 11 March 2012 00:25, Sean Cavanaugh <<a href="mailto:WorksOnMyMachine@gmail.com" target="_blank">WorksOnMyMachine@gmail.com</a>><br>

wrote:<br>

 On 3/10/2012 4:37 AM, Manu wrote:<br>

</blockquote></blockquote></div></blockquote>

<br>

[snip]<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Manu, please go read the D ABI (<a href="http://dlang.org/abi.html" target="_blank">http://dlang.org/abi.html</a>). Remember,<br>

your example of returning two values using Tuple vs 'real' MRV? The D ABI<br>

states that those values will be returned via registers. Returning<br>

something larger? Then the NVRO kicks in which gives you a zero copy<br>

approach. On x86-64 these limits are different, since you have more<br>

registers to play with, but the concept is the same. In fact, returning<br>

arguments has always been more efficient than passing arguments.<br>

<br>

</blockquote>

<br></div><div class="im">

Please go read my prior posts. This has absolutely no bearing on what I'm<br>

talking about. In fact, it fuels my argument in some cases.<br>

<br>

That said, I've read it, and it scares the hell out of me. D states that a<br>

small struct may be returned in up to 2 registers (8 bytes/16 bytes), I<br>

suspect this is a hack introduced specifically to make ranges efficient?<br>

If it is not simply a struct of 2 register sized things, they get packed<br>

into a magic 8-16 byte struct implicitly for return, and this makes my<br>

brain explode. If I wanted to perform a bunch of shifts, or's, and and's<br>

until it's snugly packed into a 8-16 byte block before returning, I'd do<br></div>

that explicitly... I DON'T want to do that however, and I *really* don't<div class="im"><br>

want the language doing that for me. If I want to return 4 byte values,<br>

they should be returned in 4 registers, not packed together into a 32bit<br>

word and returned in one. Also, if there are mixed ints, floats, vectors,<br>

even other structs, none of it works; what if there is float data? Swapping<br>

registers now? That's one of the worst penalties there is.<br>

</div></blockquote>

<br>

Except that a 4 byte struct is packed into a single memory word and that its impossible to return using 4 registers in D (EBX is treated specially; I think it's for TLS and stuff, but I'm not sure). As for the rest, all of these low level optimizations are standard C++ techniques back from when computers were still 16-bits. Walter has updated the D backend as little as possible, so pretty much everything comes from how DMC++ did things. It's also how Microsoft, Apple and *nix do things.<br>

</blockquote><div><br></div><div>x86 is not the only architecture on earth, it's arguably not even a particularly important one commercially with respect to realtime software. Also from a codegen point of view, it's usually the least important architecture by a mile, since x86 chips don't actually run the code they receive, they reinterpret it to some microcode and perform a crap load of clever optimisations in the process. Every other architecture has many more registers, and uses many more of them for passing args. They also suffer much greater penalties than x86 in general.</div>

<div><br></div><div>What I'm talking about is a language feature that implements (and guarantees) a policy to return many things in EXACTLY the same way that it passes many args TO a function. This is already the most efficient way to pass many things between functions in any given architecture.</div>

<div><br></div><div>In addition to that, I'm also discussing the usefulness of a nice sugary syntax to do this at the same time, for clarity and productivity.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

GCC-64 for example, does the following "Entire object is returned in integer registers and/or XMM registers if the size is no bigger than 128 bits, otherwise on the stack. Each 64-bit part of the object is<br>

transferred in an XMM register if it contains only float or double, or in an integer register if it contains integer types or mixed integer and float. Two consecutive float’s can be packed into the lower half of one XMM register.</blockquote>

<div><br></div><div>They shouldn't need to be packed into anything, they already live in their own registers. The ABI specifies a certain number of argument registers, it can return in those at zero cost.</div><div>And again, you're only considering x86.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Consecutive double’s are not packed. No more than 64 bits of each XMM register is used. Use P if not enough vacant registers. P: Pointer to temporary memory space passed to function. Pointer may be passed in register if fastcall or 64-bit mode, otherwise on stack. Same pointer is returned in AX, EAX or RAX."<br>

</blockquote><div><br></div><div>Wasteful, it still has more arg registers available. It's limiting its self by traditional C conventions.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Manu, if this is truly only breaking you head now, then all I can conclude is that you opened this discussion on low level function return optimizations with absolutely zero knowledge of the subject matter you were trying to discus; I find this very frustrating.<br>


</blockquote></div><div><br></div><div>I have 15+ years of experience with C on basically every architecture you could name, I'm absolutely aware of what C ABI's look like with regard to returning structures by value.</div>

<div>I find the common assumption that all computers on earth are x86 perhaps even more frustrating, and the fact that you've missed the point about usage of ALL the argument registers to avoid pointless packing/unpacking and memory access entirely.</div>

<div>C is incapable of expressing MRV, and doesn't have an ABI for it, talking about C compiler optimisation tricks is irrelevant. D should define an MRV ABI which is precisely the ABI for passing multiple args TO a function, but in reverse, for any given architecture. This also has the lovely side effect of guaranteeing correct argument placement for chain-called functions.</div>