Thoughts on possible tuple semantics

Wed Aug 21 09:32:15 PDT 2013

On Wednesday, 21 August 2013 at 15:22:59 UTC, Dicebot wrote:
> Inspired by recent syntax thread I decided to write down by 
> vision of how native language tuples should behave and how it 
> may integrate into existing language state. There are lot of 
> tricky corner cases and I'd like to see some more opinions on 
> topic before trying DIP with full deprecation path design.
>
> I didn't care about syntax or exact tuple features such as 
> unpacking. Main questions for me were "what this thing actually 
> is? and "how it can be expressed in existing language terms?"
>
> ----- Core -----
>
> Define two distinct built-in "tuples":
>
>     - Template argument sequence: may contain anything that is 
> legal template
>         argument.
>     - Value tuple or simply "tuple": may contain any values, 
> including run-time
>     values. Value storage location is not defined.
>
> Template argument sequence that contains only types is called 
> "type sequence"
> and is considered a type on its own. Type of value tuple is a 
> type sequence.
>
> Mixed template argument sequences are considered types too, but 
> special one. Those
> types can be aliased or passed to templates but can't be 
> instantiated.
>
> Each of these two entities has its own literal type. This is 
> required to avoid
> ambiguity between storing symbol as a type and taking its value 
> on run-time. Tuple
> always does the latter.
>
> In further examples I will use imaginary syntax:
>     ctseq(int, string, 42) : template argument sequence
>     tuple("one", 2, three) : value tuple
>
>>>>>> 
>
> // connections to existing template syntax
>
> void foo(T...)(T args)
> {
>     static assert(is(T == ctseq(int, string)));
>     static assert(is(typeof(args) == T));
>     assert(args == tuple(1, "2"));
>     int a = 1;
>     string b = "2";
>     assert(args == tuple(a, b));
>     static assert(typeof(tuple(a, b)) == ctseq(int, string));
> }
>
> foo(1, "2");
>
> // type semantics of type sequence
>
> ctseq(int, int) twoVars;
> twoVars[0] = 42;
> twovars[1] = 43;
> assert(twoVars == tuple(42, 43));
>
> ctseq(int, 42) twoVars; // compile-time error, can't 
> instantiate mixed template argument sequence
> assert(twoVars != ctseq(42, 42)); // NOT the same, breaking 
> change
>
> // compile-time vs run-time vs type semantics
>
> auto a1 = tuple(42, 42); // ok
> auto b1 = ctseq(42, 42); // error
>
> enum a2_1 = tuple(42, 42); // ok
> int a, b;
> enum a2_2 = tuple(a, b); // error
> enum b2 = ctseq(42, 42); // error, breaking change
>
> alias a3 = tuple(42, 42); // error
> alias b3 = ctseq(42, 42); // ok
>
> <<<<<
>
> ----- (auto) expansion / packing / unpacking -----
>
> As Andrei has stated clearly that he does not like 
> auto-expansion and considers
> it a major mistake, I was trying to imagine how that idea can 
> be incorporated
> into idiomatic D code.
>
>>>>>> 
>
> // existing syntax
>
> // args can't be single entity and use normal parameter passing 
> ABI at the same
> // time.
>
> void foo(T...)(T args) // following normal ABI implies unpacking
> {
> }
>
> foo(1, 2, 3); // automatic packing
>
> <<<<<
>
> Breaking something like this does not seem reasonable. But I 
> think salvation is
> the "..." part. It may be explicitly defined to "implicit 
> unpacking" and can be
> used with palin tuple code like this:
>
>>>>>> 
>
> void boo(T)(T args)
> {
>     foo(args.expand);
> }
>
> boo(tuple(1, 2, 3));
>
> <<<<<
>
> One thing to consisder is .tupleof - should it result in actual 
> tuple or maintain
> current behavior? Former is probably more reasonable but it is 
> even more break
>
> struct S { int a, b, c; }
>
> foo(S.init.tupleof.expand); // huh
>
> .expand should be probably defined as a simple syntax rewrite:
>     - tuple(a, b)" to "a, b" for literals
>     - "tupleVar" to "tupleVar[0], tupleVar[1]" for variables
>
> That also implies that packing / unpacking syntax, whatever it 
> can be,
> is completely unrelated to expansion - former can't be 
> expressed as a simple
> syntax rewrite, latter can't have special semantics tied to it 
> without creating
> even more meta-type to represent it.
>
> ----- ABI -----
>
> Once built-in value tuples get recognized as a distinct entity, 
> there is no
> reason to now allow using them for return values or as 
> un-expanded parameters.
>
> All is needed is to define that tuple(a, b, c) has same ABI for 
> return values
> and parameters as a struct Tuple{ typeof(a) a; typeof(b) b; 
> typeof(c) c; } - I have been
> told that there are some issues with that approach but with no 
> clear explanation.
>
> Mangling question remains open.
>
>
> ----- std.typetuple.TypeTuple / std.typecons.Tuple -----
>
> No need to keep them other than for backwards compatibility ;)

I like it :)

The question of ABI is interesing. I can think of a few options, 
working relative to the System V x86_64 ABI (read C on linux x64) 
as it's what i'm familiar with:

treat tuples as structs:

advantages: simple to implement, easy to interact with other ABI 
compliant code.

disadvantages: when returning tuple > 8 bytes requires using up 
an extra register on the *calling* side as a struct return is 
done via a pointer in EDI (i.e. 1st argument) to caller-allocated 
stack memory. This introduces an extra indirection. It's not the 
fastest option. Same for when passing them.

treat tuples as seperate arguments (when possible):

This would mean defining a new ABI on the returning side. If we 
used something like the System V *calling* ABI, then we'd get:

advantages: no indirection, arguments are ready in registers for 
callee, results are ready in registers for the caller to access 
quickly. Fast. Manu suggests that the advantages are greater on 
non-x86

disadvantages: not compatible with other ABI compliant code. 
Increased register pressure for both caller and callee in some 
circumstances. Could requires some extra movs from stack to 
registers on caller side, dependent on where the tuple is 
previously stored/later needed.

A consideration for all of this: I predict we would quickly start 
seeing a lot of code that takes a tuple and returns a modified 
version.
Using the struct option:
if larger than 8 bytes, pass pointer to old tuple-struct and new 
one. Return address of new one in RAX. Indirection but not too 
bad.

Using the seperate option:
pass tuple members in seperate registers (when small enough). If 
they're already in registers and the callee doesn't have to move 
them out, this couldn't be any faster.* Could (v. rarely these 
days) result in arguments overflowing on to stack.

*movs between registers are essentially free on modern x86/64