Wider tuple design discussion

Tue Aug 2 06:15:52 PDT 2011

Walter has asked for a more global vision on tuples before implementing their unpacking, so need to show the discussion here again:
http://d.puremagic.com/issues/show_bug.cgi?id=6365#c26

Walter:
> My main reservation about this is not that I can see anything obviously wrong
> about it, but I am nervous we are building a special case for tuples. I think
> any expansion of tuple support should be part of a more comprehensive design
> for tuples.
> 
> If we grow tuple support by adding piecemeal special cases for this and that,
> without thinking about them more globally, we are liable to build ourselves
> into a box we can't get out of.
> 
> I can give examples in other languages that did this - the most infamous being
> the great C idea of conflating arrays and pointers. It seems to be a great idea
> in the small, but only much larger experience showed what a disastrous mistake
> it was.
> 
> We should step back and figure out what we want to do with tuples in the much
> more general case.

More stuff I've written:
http://d.puremagic.com/issues/show_bug.cgi?id=6367
http://d.puremagic.com/issues/show_bug.cgi?id=6383

A global design for tuples in D was right what I have tried to avoid, because I have thought that Andrei was against a full tuple built-in design and that generally there was not so much desire for large changes in D2 now. With that syntax sugar suggestion I have tried to find the simpler and more essential D change that allows me to use tuples in a more handy way. Now I'm back to the start. And I still fear that a better and more comprehensive tuple design will be refused in D. So I fear there is no way out: if you design a minimal change Walter fears of corner cases and myopic design, and if you design a comprehensive tuple, it risks being a too much big change for this stage of D evolution. In both cases the end result is of not having good enough tuples in D.

In the end Walter is right. Better to not have built-in support for tuples than having some wrongly designed built-in syntax sugar for tuples. So let's think about the only viable alternative, a wider design, and hope.

When you design something you need first of all to think about what your tool will be used for.

Tuples are used all the time in functional-style languages, usually once every line of code or so. In Python and Haskell I use them all the time. In D I already use tuples, but they are not so handy.

Tuples are not essential in D. You are able to write large programs without tuples. But they are sometimes handy, especially if you use a more functional-style programming.

Tuples do have some disadvantages. Their fields often are anonymous, this goes against the typical well defined style of D/Java/Ada programs, where you know the name of what you are using. So tuples are better if you use them locally, or if they have only few items. Long tuples become hard to use because you risk forgetting what each field is, so the most common tuples have between 2 and 5 items. In Python you are also allowed to define a tuple with zero items, and a tuple with one item, but in my experience they are not so useful (in Python you sometimes use longer tuples made of items of the same type, because you are using them essentially as immutable arrays. This is considered not pythonic). D tuples allow a name for the fields, this is useful at usage point, because instead of a "[2]" you use (example) ".isOpen", that's more descriptive.

A tuple isn't an object, it's more like a record, it's almost like a more handy POD. So tuples are data structures that you use with code (functions) that they don't contain. OOP design is against this. But I am not fond of pure OO design. There are many situations where you want something more handy, more functional style, or more in "abstract data structure"-style. Python, D, Scala and other languages recognize this.

In some languages (like Python) tuples are immutable, while in D they are the opposite, you currently can't give them immutable fields. In D tuples are values, while in Python they are used by kind of reference (by name). In functional languages you can't how they managed, because there is referential transparency (so even if there is a reference, it's invisible. So the compiler is free to implement them as it wants, and optimize as much as it can).

In some languages tuples are managed with a nominal type system, while in most other situations they are managed by a structural type system (that means that two tuples are seen of different type only if the types of their fields differ). (Time ago I have even suggested a @structural attribute to be used in the Phobos code that defines D tuples).

There are many operations you want to perform on tuples, the most basic ones are:
0) define the type of a tuple
1) create a tuple
2) copy a tuple
3) Read a tuple field
4) Write a tuple field
5) Unpack tuple fields into variables, this is useful in 3 or more different situations:
5.1) The most basic one is unpacking a tuple locally, inside a function.
5.2) Another common situation is unpacking a small tuple inside a foreach statement, when you iterate on a collection of tuples.
5.3) Another usage (quite common in Haskell and similar languages) is to unpack a tuple in the signature of a function. Haskell even allows to give names at the same time to both fields and to the whole tuple.

Other operations are:
6) Slice a tuple like a Python/D array: tup[1..3]. (This is already supported in D, but you can't use the normal slice syntax, you need to use tup.slice!(1, 3) ).
7) Concat two or more tuples (using ~ and ~= operators).

There are many other operations (remove a field, rename a field, etc), but they are not commonly useful. If you know a common need that I have not listed please show it.

It's better to show some examples of those 5.1, 5.2 and 5.3.

5.1 is useful for functions that return more than one value too:
auto (x, y) = tuple(1, 2);
(int x, int y) = foo(100);
(int x, auto y, auto z) = bar("hello");
The proposal in 6365 was to implement this very common case, that currently is not good enough in D.

5.2 (a different syntax is possible, this is just an example):
auto array = [tuple(1,"foo"), tuple(2,"bar")];
foreach (tuple(id, name); array) {}

5.3) This is already possible in D using the typetuple coming from a tuple using the (undocumented?) .tupleof. But this has the problem of losing some safety, this compiles with no errors:

import std.typecons;
void bar(int i, string s) {}
void main() {
    auto t1 = tuple(1, "foo");
    bar(t1.tupleof);
    auto t2 = tuple(1);
    auto t3 = tuple("foo");
    bar(t2.tupleof, t3.tupleof);
}

This is not allowed in Python2 and Haskell, because tuples are true solid tuples at the unpacking point too, in the function signature.

To implement the feature 5.3 well you need something the enforces the tuple length at the unpacking point too, something more like this:

void bar(Tuple!(int i, string s)) {
    // here use varibles s and i 
}

In Python there are more features similar to the point (5). Yo are allowed to unpack a dynamic array too into variables, and even a lazy generator. This is so handy and useful that I have suggested to allow the same thing in D too:
http://d.puremagic.com/issues/show_bug.cgi?id=6383

Some usage examples, assign from a dynamic array, in D2:

import std.string;
void main() {
    auto s = " foo bar ";
    auto ss = s.split();
    auto sa = ss[0];
    auto sb = ss[1];
}

In Python2.6:

>>> sa, sb = " foo bar ".split()
>>> sa
'foo'
>>> sb
'bar'

Proposed:

import std.string;
void main() {
    auto s = " foo bar ";
    auto (sa, sb) = s.split();
}

>From lazy range, D2:

import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";

    // splitter doesn't work yet for this
    // (here it doesn't strip the leading space)
    //auto xy = map!(to!int)(splitter(s));

    auto xy = map!(to!int)(split(s));
    int x = xy[0];
    int y = xy[1];
}

In Python:

>>> x, y = map(int, " 15 27".split())
>>> x
15
>>> y
27
>>> from itertools import imap
>>> x, y = imap(int, " 15 27".split())
>>> x
15
>>> y
27

Proposed:

import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";
    (int x, int y) = map!(to!int)(splitter(s));
}

Another example of unpacking a lazy range is to unpack the result of match(x, r"...").captures.

In my opinion unpacking a dynamic array in variables is natural in D because D already contains something that logically is very similar or almost equal:
foo[] a1 = [1, 2];
int[2] a2 = a1;

This performs a run-time test on the length of a1.

In Python3 there are even ways to unpack only part of a tuple:
a, b, *rest = (1, 2, 3, 4)

More thinking is needed. Please help me answer what Walter was asking for.
And please Walter, if you have more specific questions, it's good moment to ask.

Bye,
bearophile