Suggested std.variant.Algebraic redesign

Sat Feb 23 13:44:18 PST 2013

Here I propose a different usage syntax for 
std.variant.Algebraic. I explain why with a series of steps. The 
proposed syntax is at the bottom.

- - - - - - - - - - - - - - - - - - - -

To start with an example of a simple and common algebraic data 
type, this is in F# language, and it's used in a Decision Tree 
classifier (other functional languages use a similar syntax):

type Tree =
     | Conclusion of string
     | Choice of string * (string * Tree) []

- - - - - - - - - - - - - - - - - - - -

In Haskell the definition of that type is similar:

data Tree = Conclusion String
             | Choice (String, [(String, Tree)])

Or more simply (here Choice doesn't build a tuple):

data Tree = Conclusion String
             | Choice String [(String, Tree)]

t = Choice "Sci-Fi"
            [("No", Choice "Action"
                           [("Yes", Conclusion "Stallone"),
                            ("No", Conclusion "Schwarzenegger")]),
             ("Yes", Conclusion "Schwarzenegger")]

- - - - - - - - - - - - - - - - - - - -

In D if you don't add class constructors and toString that's 
similar to:

abstract class Tree {}

final class Conclusion : Tree {
     string result;
}

final class Choice : Tree {
     string key;
     Tuple!(string, Tree)[] branches;
}

- - - - - - - - - - - - - - - - - - - -

A better D version (I have used opCall to have less noisy tree 
literals):

import std.typecons: Tuple, tuple;
import std.string: format;
import std.array: replace;

abstract class Tree {
     override string toString() const;
}

final class Conclusion : Tree {
     string result;

     static typeof(this) opCall(typeof(result) result_) {
         auto r = new typeof(this)();
         r.result = result_;
         return r;
     }

     override string toString() const {
         return `Conclusion("` ~ result ~ `")`;
     }
}

final class Choice : Tree {
     string key;
     Tuple!(string, Tree)[] branches;

     static typeof(this) opCall(typeof(key) key_,
                                typeof(branches) branches_) {
         auto r = new typeof(this)();
         r.key = key_;
         r.branches = branches_;
         return r;
     }

     override string toString() const {
         return format(`Choice("%s", %s)`, key, branches)
                .replace("const(Tuple!(string, Tree))", "B");
     }
}

void main() {
     import std.stdio;
     alias B = typeof(Choice.branches[0]);

     auto t = Choice("Sci-Fi",
                     [B("No", Choice("Action",
                                     [B("Yes", 
Conclusion("Stallone")),
                                      B("No", 
Conclusion("Schwarzenegger"))])),
                      B("Yes", Conclusion("Schwarzenegger"))]);

     writeln(t);
}

- - - - - - - - - - - - - - - - - - - -

The B type is necessary because the literal:
tuple("foo", Conclusion("bar"))

is of type:
Tuple!(string, Conclusion)

Instead of a type similar to this as in Haskell:
Tuple!(string, Tree)

- - - - - - - - - - - - - - - - - - - -

Currently in D you can't use a std.variant.Algebraic for this 
common situation because it doesn't support recursive data types. 
Once Algebraic supports recursive types, I think it also should 
support field names; maybe like this:

alias Tree = Algebraic!(
     string, "Conclusion",
     Tuple!(string, Tuple!(string, Tree)[]), "Choice"
);

(Is it possible to create an Algebraic that supports recursive 
data types? Inside the definition of Tree there is a reference to 
the Tree identifier that will be created by the alias.)

- - - - - - - - - - - - - - - - - - - -

But putting the names on the front as in F#/OCaML/Haskell and 
other functional languages seems more standard and more clear:

alias Tree = Algebraic!(
     "Conclusion", string,
     "Choice", Tuple!(string, Tuple!(string, Tree)[])
);

- - - - - - - - - - - - - - - - - - - -

The names are able to tell apart the fields so there is no need 
to wrap the second-level types in a Tuple, as in F# (so after the 
string that represents the field name there is one or more types 
separated by a comma), this seems better and good enough:

alias Tree = Algebraic!(
     "Conclusion", string,
     "Choice", string, Tuple!(string, Tree)[]
);

- - - - - - - - - - - - - - - - - - - -

If that idea goes well, then I think the equivalent D version 
becomes:

import std.typecons: Algebraic;

alias Tree = Algebraic!(
     "Conclusion", string,
     "Choice", string, Tuple!(string, Tree)[]
);

void main() {
     import std.stdio;
     alias Choice = Algebraic.Choice;
     alias Conclusion = Algebraic.Conclusion;

     auto t = Choice("Sci-Fi",
         [tuple("No", Choice("Action",
             [tuple("Yes", Conclusion("Stallone")),
              tuple("No", Conclusion("Schwarzenegger"))])),
          tuple("Yes", Conclusion("Schwarzenegger"))]);

     writeln(t);
}

This D code is not as good as equivalent Rust/Scala code, but I 
think it's acceptable. All this assumes it's possible to create 
an Algebraic that supports recursive data types.

Here I think the alias B is not necessary, because Choice and 
Conclusion return a value of type Tree, so tuple("x", 
Conclusion("y")) is of type Tuple!(string, Tree) and the arrays 
are of the correct type for Choice.

- - - - - - - - - - - - - - - - - - - -

If a more succinct tuple literal syntax will be introduced, that 
code will look even more like functional language code (with just 
few more parentheses that increase the noise a little. On the 
other hand similar nested literals are not common in code, they 
are hard to write in Haskell too. Such data structures are 
usually built by the code).

- - - - - - - - - - - - - - - - - - - -

Better ideas are welcome :-)

Bye,
bearophile