A Discussion of Tuple Syntax

Fri Aug 16 14:07:50 PDT 2013

Awhile ago Kenji posted this excellent dip 
(http://wiki.dlang.org/DIP32) that aimed to improve tuple syntax, 
and described several cases in which tuples could be 
destructured. You can see his original thread here: 
http://forum.dlang.org/thread/mailman.372.1364547485.4724.digitalmars-d@puremagic.com, 
and further discussion in this thread: 
http://forum.dlang.org/thread/dofwinzpbcdwkvhzcgrk@forum.dlang.org.

It seemed that there was a lot of interest in having syntax 
somewhat like what is described in Kenji's DIP, but it didn't 
really go anywhere. There is this pull on Github 
(https://github.com/D-Programming-Language/dmd/pull/341), but it 
uses the (a, b) syntax, which has too much overlap with other 
language constructs. Andrei/Walter didn't want to merge that pull 
request without a full consideration of the different design 
issues involved, which in retrospect was a good decision.

That said, I'd like to open the discussion on tuple syntax yet 
again. Tuples are currently sorely underused in D, due in large 
part to being difficult to understand and awkward to use. One 
large barrier to entry is that fact that D has not 1, not 2, but 
3 different types of tuples (depending on how you look at it), 
which are difficult to keep straight.

There is std.typecons.Tuple, which is fundamentally different 
from std.typecons.TypeTuple in that it's implemented as a struct, 
while TypeTuple is just a template wrapped around the compiler 
tuple type. ExpressionTuples are really just TypeTuples that 
contain only values, and aren't mentioned anywhere except for in 
this article: http://dlang.org/tuple.html, which frankly creates 
more confusion than clarity.

A good, comprehensive design has the potential to make tuples 
easy to use and understand, and hopefully clear up the unpleasant 
situation we have currently. A summary of what has been discussed 
so far:

- (a, b) is the prettiest syntax, and it also completely 
infeasible

- {a, b} is not as pretty, but it's not that bad of an 
alternative (though it may still have issues as well)

- #(a, b) is unambiguous and would probably be the easiest 
option. I don't think it looks too bad, but some people might 
find it ugly and noisy

- How should tuples be expanded? There is the precedent of an 
expand() method of std.typecons.Tuple, but Kenji liked tup[] 
(slicing syntax). So with a tuple of #(1, "a", 0.0), tup[0..2] 
would be an expanded tuple containing 1 and "a". On the other 
hand, Bearophile and Timon Gehr preferred that slicing a tuple 
create another "closed" tuple, and to use expand() for expansion. 
So tup[] would create a copy of the tuple, and tup[0..2] would 
create a closed tuple eqvivalent to #(1, "a"). I don't have any 
particular preference in that regard.

- Timon Gehr wanted the ability to swap tuple values, so #(x, y) 
= #(y, x) would be allowed. Kenji was against it, saying that it 
would introduce too many complications.

- There was no consensus on the pattern matching syntax for 
unpacking. For example, #(a, _) = #(1, 2) only introduces one 
binding, "a", into the surrounding scope. The question is, what 
character should go in the place of "_" to signify that a value 
should not be bound? Some suggestions were #(a, $), #(a, @), #(a, 
?). I personally think #(a, ?) or #(a, *) would be best, but all 
that's  really necessary is a symbol that cannot also be an 
identifier.

     Also up for debate was nested patterns, e.g., #(1, 2, #(3, 4, 
#(5, 6))). I don't think there was a consensus on unpacking and 
pattern matching for this situation. One idea I saw that looked 
good:

         * Use "..." to pattern match on the tail of an 
expressions, so take the above tuple. The pattern #(1, ?, ...) 
would match the two nested sub-tuples. Or, say, #(1, 2, 3) could 
be matched by #(1, 2, 3), #(1, ?, 3), #(1, ...), etc. You 
obviously can't refer to "..." as a variable, so it also becomes 
a useful way of saying "don't care" for multiple items, e.g., 
#(a, ...) -> only bind the first item in the tuple. We can play 
around with this to get a few other useful constructs, such as 
#(a, ..., b) -> match first and last, #(..., b) -> match last, 
etc.

Assuming the "..." syntax for unpacking, it would be useful to 
name the captured tail. For example, you could unpack #(1, 3, 
#(4, 6)) into #(a, b, x...), where a = 1, b = 3, x = #(4, 6). 
Similarly, #(head, rest...) results in head = 1, rest = #(2, #(4, 
6)). I think this would be very useful.

- Concatenating tuples with ~. This is nice to have, but not 
particularly important.

One thing that I think was overlooked, but would be pretty cool, 
is that a tuple unpacking/pattern matching syntax would allow us 
to unpack/pattern match just about anything that you can make a 
tuple of in D. Combine this with the .tupleof property, and 
things get interesting... Maybe. There is one possible problem: 
.tupleof returns a TypeTuple, and it's not at all clear to me 
how, if at all, TypeTuple would work with the proposed syntax. Is 
#(int, string, bool) a valid tuple instantiation? This is 
something that needs to be worked out.

This is the third or fourth time that I know of that tuple syntax 
has come up, and as of yet, nothing has been done about it. I'd 
really like to get the ball rolling on this, as I think a good 
syntax for these tuple operations would do D a world of good. I'm 
not a compiler hacker, unfortunately, so I can't implement it 
myself as proof of concept... However, I hope that discussing it 
and working out all the kinks will help pave the way for an 
actual implementation.