draft proposal for Sum Types for D

Wed Nov 30 14:20:31 UTC 2022

On Tuesday, 29 November 2022 at 06:26:20 UTC, Walter Bright wrote:
> Go ahead, Make My Day! Destroy!
>
> https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

## Feedback

1. In *Alternative Syntax* the following is not supported by the 
grammar you provided:
```d
sumtype Option(T) { None, Some(T) }
```
It should probably be
```d
sumtype Option(T) { None, T Some }
```

2. Maybe you should mention anonymous `sumtype` (cf. anonymous 
`struct` and anonymous `union`).

3. As for a keyword, you could circumvent the problem by re-using 
existing keywords. `enum union` would be a candidate.

4. “Members of sumtypes cannot have copy constructors, postblits, 
or destructors.” Limits its application severely. Andrei had a 
talk explaining how destruction of `std::variant` in C++ could be 
done efficiently using `static foreach` – if C++ had it. I don’t 
see how a copy constructor/postblit is evil.

5. “Sumtypes can be copied *if all its constituent types can be 
copied*.”

## Discussion

It seems it tries to be too many things at once and that is 
confusing. It tries to be:
1. An algebraic union with a tagged union as its special case
2. An optional type
3. Some form of non-nullable annotation.

Let’s talk about 3. first: The recognition of a magical pattern 
is bad; it is backwards and leads to surprises. Were reference 
types non-nullable from day one, `enum union { typeof(null), int* 
}` would be a great candidate for a nullable pointer to `int`.

The same way `typeof(return)` is special-cased in the grammar 
(the token `return` not being an expression), you could at least 
special-case `null` (the keyword) as a possible case of a D sum 
type. However, this is still a confusing design because it looks 
like you’re *adding* a case, but actually it *removes* a value 
form the overall type. I’ve heard of negative types in type 
theory before, but to be honest, I don’t know much about it. 
Better than seemingly *adding* `null` would be to add a 
(visually) negative `null`: Allow the pseudo-member `!null` or 
`-null`. It’s still weird and maybe even hard to teach, but at 
least we can tell people: “The same way you can *add* something 
to 2 to make it 1, namely −1, you can *add* `-typeof(null)` to 
`int*`, you get an `int*` that cannot be `null`. For your 
convenience, instead of `-typeof(null)` you can write `-null`.”

A sum type consisting of at least one reference type (pointers, 
classes, AAs) may include `-null` as a special member that makes 
the nullable options non-nullable.

We can even add syntax for simple non-nullable reference types: 
`int*-null` or `int*\null` or `int*!` or whatever you like. 
Likely, it’d be a lowering to a template in object.d named 
`nonnull` or something similar.

As for 1. and 2., those make sense. Adding a single, distinct 
value to a type makes this an optional type. A template 
`optional` in Phobos would become a vocabulary type and makes 
people not implement their own optionals and be implemented using 
a sum type. This could even be added to object.d with syntax.

I still think that D would be a better and safer language if it 
had non-null reference types: In such a hypothetical D, an `int*` 
always points to an `int`, but an `int*?` might be `null`. For 
value types, something like `int?` can be done in two ways: 
Either reserve an otherwise invalid value as the null value 
(possibly −2³²) or make `int?` an alias for `enum union 
NullableInt { typeof(null), int }` – the first option is a 
non-starter because of backward compatibility, but an entirely 
new language could do that. User-defined types can have a 
compile-time constant `opNull` that tells the compiler: “Not all 
combinations of values for my members signify a valid object; use 
this as the ‘null’ object instead of adding a boolean tag when 
I’m combined with `?`.” It’s probably worth exploring how much of 
that can be done and to what cost.

Why all this talk about optional types? Because they play well 
with sum types. Asking for a possible variant of a sum type is 
inherently returning an optional value. The first step to getting 
sum types right is getting optional types right. That doesn’t 
mean optional types cannot be a special case of sum types. In 
that sense, optional types are the most relevant application of 
sum types.

```diff
   EnumUnionDeclaration:
       `enum` `union` Identifier EnumUnionBody
       EnumUnionTemplateDeclaration
+     AnonymousEnumUnionDeclaration
+
+ AnonymousEnumUnionDeclaration:
+     `enum` `union` EnumUnionBody

   EnumUnionTemplateDeclaration:
       `enum` `union` Identifier TemplateParameters Constraint 
(opt) EnumUnionBody

   EnumUnionBody:
       `{` EnumUnionMembers `}`

   EnumUnionMembers:
       EnumUnionMember
       EnumUnionMember `,`
       EnumUnionMember `,` EnumUnionMembers

   EnumUnionMember:
+     'null'
+     '!null'
       EnumMemberAttributes EnumUnionMember
       EnumMember
       FieldDeclaration

   EnumMember:
-     Identifier
-     Identifier = AssignExpression
+     case Identifier
+     case Identifier = AssignExpression

   FieldDeclaration:
-     Type Identifier
-     Type Identifier = AssignExpression
+     Type Identifier(opt)
+     Type Identifier(opt) = AssignExpression

   QueryExpression:
       `?` PostfixExpression `.` Identifier
```

I mentioned `null` and `!null` above. `null` is a shorthand for 
`typeof(null)`. Omitting the `Identifier` is allowed only when 
the `EnumMember` is the only one with its type and the 
declaration is not anonymous.

Basically we have these categories of sum types:
* Optionals: 1 member (usually unnamed) plus `null`.
* Non-nulls: 1 member (usually unnamed) of reference type minus 
`null` (= plus `!null`).
* Commutative: ≥ 2 members (usually unnamed) with different types.
* Homogeneous: ≥ 2 named members with the same type.
* Algebraic: All members named, potentially with repeated types, 
potentially self-referential.
* Non-null commutative: ≥ 2 members (usually unnamed) with 
different types among which are reference types plus `!null`
* Non-null algebraic: All members named, potentially with 
repeated types among which are reference types (potentially 
self-referential) plus `!null`.

A member can be an untyped named constant. Those are equivalent 
to members of unique unit type. But if types are optional and 
identifiers are optional, how do we know which is which? We 
don’t. We need something to disambiguate. I went with `case` for 
something that’s maybe acceptable. Here, `case` can be read as a 
type: “Make a unique case type for this.” The `case` in `case x = 
init` is `Case_x` defined as `struct Case_x { typeof(init) value 
= init; }`. Because the type is a unit type, its value need not 
be stored in the instance.
If all members are `case` members, the type is effectively a 
plain-old `enum`. The union (not the tag field) can be elided as 
the tag field suffices to retrieve the values. The only 
difference to regular `enum` I see is that the DIP proposes that 
for the tag field, the smallest available integral type be used, 
whereas `enum` by default has `int` values.

I call the (usually unnamed) type-distinguished sum types 
commutative because there’s no reason whatsoever to treat `int + 
string` and `string + int` differently. There is *the* `int` 
member and *the* string member. It’s the same type spelled 
differently the same way ordering of type constructors or 
function attributes (on e.g. function pointer types) not only 
makes no difference, but creates the very same type. One does not 
simply reorder a struct, but one can reorder a union and 
commutative sum types are close enough. Algebraic sum types are 
too much struct-like to be reordered (naming matters, knowing the 
type to extract does not suffice).

The member querying syntax is okay, I guess it can be improved. 
While `value.member?` reads best, it meaningfully conflicts with 
the trinary operator, so the second best would be 
`value.?member`. If a sum type is an optional type, i.e. it is 
`enum union X {null, T}` for some `T`, it’d be great to have some 
C#-esque `?.` and `??` operators as well: `optionalcat?.name` 
returns an optional string: `null` if there is no cat and the 
cat’s name if there is a cat. This conflicts with the trinary 
operator as well, but the `?` followed by module-scope `.` 
without a space should be virtually non-existent. For a nullable 
value `v`, `v ?? d` returns `v` if it’s not `null` (but typed as 
not null) and `d` (default, lazy evaluated) if it is. A `.` could 
implicitly be `.?` if the member after it is followed up by `??` 
or `?.`, i.e. `value.member?.toString()` is actually 
`value.?member?.toString()` and `yourpet.cat ?? mycat` is 
actually `yourpet.?cat ?? mycat` because you might not have a cat.

Querying for a commutative sum type would usually be done via the 
type: The sum type converts implicitly to an optional of any of 
its constituent types:
```d
// object.d
enum union SumType(Ts...) { Ts }
enum union NotNull(Ts...) if (/* any is reference type */) { 
!null, Ts }
enum union Nullable(T) if (/* T is not reference type and has no 
opNull */) { null, T }
// main.d
StringOrInt = SumType!(string, int);
StringOrInt stringOrInt;
if (string? s = stringOrInt) { }
else (int? i = stringOrInt) { }
else assert(0);
```
Also, `stringOrInt is int` and `stringOrInt !is int` would work.

For an associative array, `aa[key]` can return an optional value 
instead of throwing  a `RangeError`. That would enable `aa[key] 
?? d`; and `aa[key] ??= value` can set the value only if `key` is 
not present already.

For a (nullable) pointer `ptr`, the dereference expression `*ptr` 
can be treated like an optional value if it is followed by `??` 
or `?.`, and ptr?.member

I see one problem with `?.` vs. `.?`, but the rule is easy: The 
question mark is where the optional is. For `.?`, the left 
operand is a concrete value and the member on the right side 
might or might not be present. If optional sum types are a common 
thing, maybe we need `?.?`: `lhs?.?member` is: “Give me the 
member’s value if `lhs` isn’t `null` and `member` is there; 
otherwise give me `null`.”

Sorry for the long post.