A Discussion of Tuple Syntax

Tue Aug 20 18:43:45 PDT 2013

Andrei Alexandrescu:

> 1. What do we need?

We can live fine without a syntax to manage tuples, so strictly 
speaking we need nothing (almost, I'd like to kill one currently 
accepted syntax, see below).

But when you write D in a high-level style, and you are used to 
other functional or scripting languages, in several cases you 
desire a more handy/compact syntax to manage tuples.

Tuples are simple data structure, so the operations you want to 
do with them are few and simple:
- To define a tuple of various fields of different type, that is 
a tuple literal, to create a tuple;
- A way to read and write items of a tuple (unless it's 
read-only), D uses a nice syntax [i] as arrays indexing, but i 
can't be a run-time value.
- With the Phobos tuple we are used to giving names to fields. 
It's handy, but it's not so necessary if you have a way to assign 
tuple items to variables, defined in-place. There are several 
places where pulling apart a tuple in that way is useful, here I 
use a basic syntax to be more clear:

Assignments:
auto (a, b) = t1;

In function signatures:
auto mult = ((int x, int y)) => x * y;

In foreach loops:

void main() {
   import std.range, std.stdio;
   auto a = [10, 20];
   auto b = [100, 200];

   // Currently D supports this syntax:
   foreach (x, y; zip(a, b))
     writeln(x, " ", y);

   // But it's unreliable, now x is the index of the
   // array instead of the first tuple fields, so I
   // think this feature of D should be killed as
   // soon as possible:
   foreach (x, y; zip(a, b).array)
     writeln(x, " ", y);
}

Its output:

10 100
20 200
0 Tuple!(int, int)(10, 100)
1 Tuple!(int, int)(20, 200)

In (final) switch cases, this also shows one use case and one 
usefulness of wildcards (and the usage of an unapply() standard 
method, as explained a bit here: 
http://d.puremagic.com/issues/show_bug.cgi?id=596 ):

final switch (tup1) {
     case (0, 10): break;
     case (0,  ?): break;
     case (?,  ?): break;
}

An nice example usage of such basic pattern matching is visible 
here, to implement the insertion in a red-black-tree. This is a 
well known hairy operation to do in languages as C:
http://rosettacode.org/wiki/Pattern_matching

This is how insertion and balancing is done in Haskell (this 
needs long lines and it probably comes out unreadable here, so 
see here for a better visualization: 
http://rosettacode.org/wiki/Pattern_matching#Haskell ):

data Color = R | B
data Tree a = E | T Color (Tree a) a (Tree a)

balance :: Color -> Tree a -> a -> Tree a -> Tree a
balance B (T R (T R a x b) y c          ) z d                     
           =
     T R (T B a x b) y (T B c z d)
balance B (T R a           x (T R b y c)) z d                     
           =
     T R (T B a x b) y (T B c z d)
balance B a                               x (T R (T R b y c) z d  
         ) =
     T R (T B a x b) y (T B c z d)
balance B a                               x (T R b           y (T 
R c z d)) =
     T R (T B a x b) y (T B c z d)
balance col a x b = T col a x b

insert :: Ord a => a -> Tree a -> Tree a
insert x s = T B a y b where
   ins E          =  T R E x E
   ins s@(T col a y b)
     | x < y      =  balance col (ins a) y b
     | x > y      =  balance col a y (ins b)
     | otherwise  =  s
   T _ a y b = ins s

There are few more little pieces that are useful, like unpacking 
an array:

string s = "red blue";
auto (col1, col2) = s.split;

Hara has suggested pattern matching in if statements, that is a 
special case of the pattern matching in switch cases, and it's 
sometimes useful (similar code is common in Scala):

if (auto {1, y} = tup) {...}

That is equivalent to:

if (tup[0] == 1) {
     auto y = tup[1];
     ...
}

It looks simple and not so essential, but when tuples become a 
little more more complex pattern matching shows it usefulness 
better, as in the Haskell example.

Languages as Haskell and Rust also show that even a simple 
version of pattern matching is handy when you manage types like 
Nullable() (that need to define an unapply() standard method).

(It's not unthinkable to support pattern matching in function 
signatures too, as often used in many functional languages as 
OCaML and Haskell, but this is an orthogonal feature, it could be 
added later, and we can probably live without it for now).

Others have suggested a syntax with ... to gather more than one 
item, or to ignore more than one item with ?...  Such syntax is 
handy with arrays but with tuples, that are often short and of 
well known size I think it's not so much useful, so it can be 
left for later.

> 2. What does a solution look like in the current language?

The DIP32 shows a complete very small program that I wrote for 
RosettaCode site, it computes the Huffman encoding of a given 
string:
http://wiki.dlang.org/DIP32

import std.stdio, std.algorithm, std.typecons, std.container, 
std.array;

auto encode(T)(Group!("a == b", T[]) sf) {
     auto heap = sf.map!(s => tuple(s[1], [tuple(s[0], "")]))
                 .array.heapify!q{b < a};

     while (heap.length > 1) {
         auto lo = heap.front; heap.removeFront;
         auto hi = heap.front; heap.removeFront;
         foreach (ref pair; lo[1]) pair[1] = '0' ~ pair[1];
         foreach (ref pair; hi[1]) pair[1] = '1' ~ pair[1];
         heap.insert(tuple(lo[0] + hi[0], lo[1] ~ hi[1]));
     }
     return heap.front[1].schwartzSort!q{tuple(a[1].length, a[0])};
}

void main() {
     auto s = "this is an example for huffman encoding"d;
     foreach (p; s.dup.sort().release.group.encode)
         writefln("'%s'  %s", p[]);
}

That little program shows only three use cases of a tuple syntax: 
in function signatures, in normal assignments, and in foreach:

import std.stdio, std.algorithm, std.container, std.array;

auto encode(T)(Group!("a == b", T[]) sf) {
     auto heap = sf.map!((c, f) => (f, [(c, 
"")])).array.heapify!q{b < a};

     while (heap.length > 1) {
         auto (lof, loa) = heap.front;  heap.removeFront;
         auto (hif, hia) = heap.front;  heap.removeFront;
         foreach ((_, ref e); loa) e = '0' ~ e;
         foreach ((_, ref e); hia) e = '1' ~ e;
         heap.insert((lof + hif, loa ~ hia));
     }
     return heap.front[1].schwartzSort!((c, e) => (e.length, c));
}

void main() {
     auto s = "this is an example for huffman encoding"d;
     foreach ((c, e); s.dup.sort().release.group.encode)
         writefln("'%s'  %s", c, e);
}

For an use case of the assignment from an array, let's say you 
have a text file containing two numbers in a line, that you want 
to read:

const (x, y) = filein.byLine.front.split.to!(int[]);

An example of implementation in a red-black-tree of the insert 
and balance in C is not too much hard to find with Google. But 
it's lot of bug-prone code.

Bye,
bearophile