A proposal: Sumtypes

Richard Andrew Cattermole (Rikki) richard at cattermole.co.nz
Thu Feb 8 15:42:25 UTC 2024


Yesterday I mentioned that I wasn't very happy with Walter's 
design of sum types, at least as per his write-up in his DIP 
repository.
I have finally after two years written up an alternative to it, 
that should cover everything you would expect from such a 
language feature.
There are also a couple of key differences with regards to the 
tag and ABI that will make value type exceptions aka zero cost 
exceptions work fairly fast.

A summary of features:

- Support both a short-hand declaration syntax similar to the ML 
family as well as the one proposed by Walter's enum-like syntax. 
With UDA's.
- The member of operator refers to the tag name
- Proposed match parameters for both name and type (although 
matching itself is not proposed)
- Copy constructors and destructor support
- Flexible ABI, if you don't use it, you won't pay for it (i.e. 
no storage for a value or function pointers for copy 
constructor/destructor)
- Default initialization using first entry or preferred ``:none``
- Implicit construction based upon value and using assignment 
expression to prefer existing tag
- Does not have the null type state
- Comparison based upon tag, and only then value
- Introspection (traits and properties)
- Set operations (merging, checking if type/name is in the set)
- No non-introspection method to access a sum type value is 
specified currently, a follow-up matching proposal would offer it 
instead.
   It can be done using the trait ``getMember``, although it will 
be up to you to validate if that is the correct entry given the 
tag for a value.

Latest version: 
https://gist.github.com/rikkimax/d25c6b2bed8caba008a6967e9e0a7e7c

Walter's DIP: 
https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

Example nullable:

```d
sumtype Nullable(T) {
     :none,
     T value
}

sumtype Nullable(T) = :none | T value;

void accept(Nullable!Duration timeout) {}

accept(1.minute);
accept(:value = 1.minute);
accept(:none);
```

The following is a copy of the proposed member of operator and 
then the sumtype for posterity's sake.

------------------------

PR: https://github.com/dlang/dmd/pull/16161

# Member Of Operator

The member of operator, is an operator that operates on a 
contextual type with respect to a given statement or declaration.

It may appear as the first term in an expression, then it may be 
followed with binary and dot expressions.

The syntax of the operator is ``':' Identifier``.

## Context

The context is a type that is provided by the statement or 
relevant declaration.

## Validation

The type that the member of operator results in is the same as 
the one it is in context of.

If it does not match, it will error.

## Valid Statements and Declarations

- Return expressions
     The compiler rewrites ``return :Identifier;`` as ``return 
typeof(return).Identifier;``.
- Variable declarations
     Type qualifiers may not appear as the variable type, there 
must be a concrete type.
     It can be thought of as the type on the variable as having 
been aliased with the alias applying to the variable type and as 
the context.
     ``Type var = :Identifier;`` would internally be rewritten as 
``__Alias var =  __Alias.Identifier;``.
- Switch statements
     The expression used by the switch statement, will need to be 
aliased as per variable declarations.
     So
     ```d
     switch(expr) {
         case :Identifier:
             break;
     }
     ```
     would be rewritten as
     ```d
     alias __Alias = typeof(expr);
     switch(expr) {
         case __Alias.Identifier:
             break;
     }
     ```
- Function calls
     During parameter to argument matching, a check to see if the 
``typeof(param).Identifier`` is possible for 
``func(:Identifier)``.
- Function parameter default initialization
     It must support the default initialization of a parameter. 
``void func(Enum e = :Start)``.
- Comparison
     The left hand side of a comparison is used as the context for 
the right hand side ``e == :Start``.
     This may require an intermediary variable to get the type of, 
prior to the comparison.

------------------------

Depends upon: [member of 
operator](https://gist.github.com/rikkimax/9e02ad538d94615d76d869070f7fd65f)

# SumTypes

Sum types are a union of types, as well as a union of names.
Some names will be applied to a type, others may not be.

It acts as a tagged union, using a tag to determine which type or 
name is currently active.

The matching capabilities are not specified here.

It is influenced from Walter Bright's DIP, although it is not a 
continuation of.

## Syntax

Two new declaration syntaxes are proposed.

The first comes from Walter Bright's proposal:

```d
sumtype Identifier (TemplateParameters) {
     @UDAs|opt Type Identifier = Expression,
     @UDAs|opt Type Identifier,
     @UDAs|opt MemberOfOperator,
}
```

TODO: swap for spec grammar version

The second is short hand which comes from the ML family:

```d
sumtype Identifier (TemplateParameters) = @UDAs|opt Type 
Identifier|opt | @UDAs|opt MemberOfOperator;
```

TODO: swap for spec grammar version

For a nullable type this would look like in both syntaxes:

```d
sumtype Nullable(T) {
     :none,
     T value
}

sumtype Nullable(T) = :none | T value;
```

## Member Of

A sumtype is a kind of tag union.
This uses a tag to differentiate between each member.
The tag is a hash of both the fully qualified name of the type 
and the name.

The tag should be stored in a CPU word size register, so that if 
only names and no types are provided, there will be no storage.

When the member of operator applies to a sumtype it will locate 
given the member of identifier from the list of names the entry.

## Proposed Match Parameters

There are two forms that need to be supported.

Both of which support a following name identifier that will be 
used for the variable declaration in the given scope.

1. The first is a the type
2. Second is the member of operator to match the name

It is recommended that if you can have conflicts to always 
declare entries with names and to always use the names in the 
matching.

```d
obj.match {
     (:entry varName) => writeln(varName);
}
```

If you did not specify a type, you may not use the renamed 
variable declaration for a given entry nor specify the entry by 
the type.

It will of course be possible to specify an entry based upon the 
member of operator.

```d
sumtype S = :none;

identity(:none);

S identity(S s) => return s;
```

As a feature this is overwise known as implicit construction and 
applies to types in general in any location including function 
arguments.

## Storage

A sumtype at runtime is represented by a flexible ABI.

1. The tag [``size_t``]
2. Copy constructor [``function``]
3. Destructor [``function``]
4. Storage [``void[X]``]

The tag always exists.

If none of the entries has a copy constructor (including 
generated), this field does not exist.

If none of the entires has a destructor (including generated), 
this field does not exist.

If none of the entries takes any storage (so all entries do not 
have a type), this field does not exist.

Copy constructors and destructors for the entries that do not 
provide one, but are needed will have a generated internal to 
object file function generated that will perform the appropriete 
action (and should we get reference counting also perform that).

For all intents and purposes a sum type is similar to a struct as 
far as when to call the copy constructors and destructors.

## Initialization

The default initialization of a sumtype will always prefer 
``:none`` if present, otherwise it is the first entry.
For the first entry on the short hand syntax it does not support 
expressions for the default initialization, therefore it will be 
the default initialized value of that type.

Assigning a value to a sum type, will always prefer the currently 
selected tag.
If however the value cannot be coerced into the tag's type, it 
will then do a match to determine the best candidate based upon 
the type of the expression.

An example of prefering the currently selected tag:

```d
sumtype S = int i | long l;

S s = :i = 2;
```

But if we switch to a larger value ``s = long.max;``, this will 
assign the long instead.

## Nullability

A sum type cannot have the type state of null.

## Set Operations

A sumtype which is a subset of another, will be assignable.

```d
sumtype S1 = :none | int;
sumtype S2 = :none | int | float;

S1 s1;
S2 s2 = s1;
```

This covers other scenarios like returning from a function or an 
argument to a function.

To remove a possible entry from a sumtype you must peform a match 
(which is not being proposed here):

```d
sumtype S1 = :none | int;
sumtype S2 = :none | int | float;

S1 s1;
S2 s2 = s1;

s2.match {
     (float) => assert(0);
     (default val) s1 = val;
}
```

To determine if a type is in the set:

```d
sumtype S1 = :none | int;

pragma(msg, int in S1); // true
pragma(msg, :none in S1); // true
pragma(msg, "none" in S1); // true
```

To merge two sumtypes together use the pipe operator on the type.

```d
sumtype S1 = :none | int i;
sumtype S2 = :none | long l;
alias S3 = S1 | S2; // :none | int i | long l
```

Or you can expand a sumtype directly into another:

```d
sumtype S1 = :none | int i;
sumtype S2 = :none | S1.expand | long l; // :none | int i | long l
```

When merging, duplicate types and names are not an error, they 
will be combined.
Although if two names have different types this will error.

## Introspection

A sumtype includes all primary properties of types including 
``sizeof``.

It has one new property, ``expand``. Which is used to expand a 
sumtype into the currently declaring one.

The trait ``allMembers`` will return a set of strings that donate 
the names of each entry. If an entry has not been given a name by 
the user, a generated name will provided that will access it 
instead.

Using the trait ``getMember`` or using ``SumpType.Member`` will 
return an alias to that entry so that you may acquire the type of 
it, or to assign to it.

For the trait ``identifier`` on an alias of the a given entry, it 
will return the name for that entry.

An is expression may be used to determine if a given type is a 
sumtype: ``is(T == sumtype)``.

## Comparison

The comparison of two sum types is first done based upon tag, if 
they are not equal that will give the less than and more than 
values.

Should they align, then a match will occur with the behavior for 
the given entry type resulting in the final comparison value.
If a given entry does not have a type, then it will return as 
equal.


More information about the Digitalmars-d mailing list